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ALTERATION OF AMINO ACID COMPOSITIONS IN SEEDS 

BACKGROUND OF THE INVENTION 

Feed formulations based on crop plants must typically be supplemented with 
5 specific amino acids to provide animals with essential nutrients which are necessary for 
their growth. This supplementation is necessary because, in general, crop plants contain 
low proportions of several amino acids which are essential for, and cannot be synthesized 
by, monogastric animals. 

The seeds of crop plants contain different classes of seed proteins. The amino acid 
10 composition of these seeds reflects the composition of the prevalent classes of proteins. 
Amino acid limitations are usually due to amino acid deficiencies of these prevalent 
protein classes. 

Among the amino acids necessary for animal nutrition, those that are of limited 
availability in crop plants include methionine, lysine, and threonine. Attempts to increase 

15 the levels of these amino acids by breeding, mutant selection, and/or changing the 

composition of the storage proteins accumulated in the seeds of crop plants, have met with 
limited success, or were accompanied by a loss in yield. 

For example, although seeds of corn plants containing a mutant transcription factor, 
(opaque 2), or a mutant ct-zein gene, (floury 2), exhibit elevated levels of total and bound 

20 lysine, there is an altered seed endosperm structure which is more susceptible to damage 
and pests. Significant yield losses are also typical. 

An alternative means to enhance levels of free amino acids in a crop plant is the 
modification of amino acid biosynthesis in the plant. The introduction of a feedback- 
regulation-insensitive dihydrodipicolinic acid synthase ("DHDPS") gene, which encodes 

25 an enzyme that catalyzes the first reaction unique to the lysine biosynthetic pathway, into 
plants has resulted in an increase in the levels of free lysine in the leaves and seeds of those 
plants. An increase in the levels of free lysine in the embryo results in reduced amount of 
oil in the seed. Further free lysine can be lost during the wet milling process reducing the 
feed value of the gluten product of the process. 

30 The expression of the lysC gene, which encodes a mutant bacterial aspartate kinase 

that is desensitized to feedback inhibition by lysine and threonine, from a seed-specific 
promoter in tobacco plants, has resulted in an increase in methionine and threonine 
biosynthesis in the seeds of those plants. See Karchi, et al ; The Plant J.: Vol. 3 ; p. 72 1 ; 
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(1993). However, expression of the lysC gene results in only a 6-7% increase in the level 
of total threonine or methionine in the seed. The expression of the lysC gene in seeds has a 
minimal impact on the nutritional value of those seeds and, thus, supplementation of feed 
containing lysC transgenic seeds with amino acids, such as methionine and threonine, is 
5 still required. 

There are additional molecular genetic strategies available for enhancing the amino 
acid quality of plant proteins. Each involves molecular manipulation of plant genes and 
the generation of transgenic plants. 

Protein sequence modification involves the identification of a gene encoding a 

10 major protein, preferably a storage protein, as the target for modification to contain more 
codons of essential amino acids. An important aspect of this approach is to be able to 
select a region of the protein that can be modified without affecting the overall structure, 
stability, function, and other cellular and nutritional properties of the protein. 

The development of DNA synthesis technology allows the design and synthesis of 

15 a gene encoding a new protein with desirable essential amino acid compositions. For 
example, researchers have synthesized a 292 -base pair DNA sequence encoding a 
polypeptide composed of 80% essential amino acids and used it with the nopaline 
synthetase (NOS) promoter to construct a chimeric gene. Expression of this gene in the 
tuber of transgenic potato has resulted in an accumulation of this protein at a level of 

20 0.02% to 0.35% of the total plant protein. This low level accumulation is possibly due to 
the weak NOS promoter and/or the instability of the new protein. 

Tobacco has been used as a test plant to demonstrate the feasibility of this approach 
by transferring a chimeric gene containing the bean phaseolin promoter and the cDNA of a 
sulfur-rich protein Brazil Nut Protein ("BNP"), (18 mol% methionine and 8 mol% 

25 cysteine) into tobacco. Amino acid analysis indicates that the methionine content in the 
transgenic seeds is enhanced by 30% over that of the untransformed seeds. This same 
chimeric gene has also been transferred into a commercial crop, canola, and similar levels 
of enhancement were achieved. 

However, an adverse effect is that lysine content decreases. Additionally, BNP has 

30 been identified as a major food allergen. Thus it is neither practical nor desirable to use 
BNP to enhance the nutritional value of crop plants. 
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Thus, there is a need to improve the nutritional value of plant seeds. The genetic 
modification should not be accompanied by detrimental side effects such as allergenicity, 
anti-nutritional quality or poor yield. 



5 SUMMARY OF THE INVENTION 

It is an object of the present invention to provide a seed, the endosperm of which 
contains elevated levels of an essential amino acid. 

It is a further object of the present invention to provide methods for increasing the 
nutritional value of feed. 
10 It is a further object of the present invention to provide methods for genetically 

modifying seeds so as to increase amounts of essential amino acids which are present in 
relatively low amounts in unmodified seeds. 

It is a further object of the present invention to provide methods for increasing the 
nutritional content of seeds without detrimental side effects such as allergenicity or anti- 
15 nutritional quality. 

It is a further object of the present invention to provide methods for increasing the 
nutritional content of seeds while maintaining a high yield. 

It is a further object of the present invention to provide a method for the expression 
of a polypeptide in a seed having levels of a preselected amino acid sufficient to reduce or 
20 obviate feed supplementation. 

According to the present invention a transformed plant seed is provided, the 
endosperm of which is characterized as having an elevated level of at least one preselected 
amino acid compared to a seed from a corresponding plant which has not been 
transformed, wherein the amino acid is lysine, threonine, or tryptophan and optionally a 
25 sulfur-containing amino acid. 

Also provided is a seed from a plant which has been transformed to express a 
heterologous protein in the endosperm of the seed, wherein the seed exhibits an elevated 
level of an essential amino acid. 

An expression cassette is also provided comprising a seed endosperm-preferred 
30 promoter operably linked to a structural gene encoding a polypeptide having an elevated 
level of a preselected amino acid. Transformed plants and seeds containing the expression 
cassette are also provided. 
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A method for elevating the level of a preselected amino acid in the endosperm of 
plant seed is also provided. The method comprises the transformation of plant cells by 
introducing the expression cassette, recovering the transformed cells, regenerating a 
transformed plant and collecting the seeds therefrom. 

5 

DETAILED DESCRIPTION OF THE INVENTION 

As used herein, a "structural gene " means an exogenous or recombinant DN A 
sequence or segment that encodes a polypeptide. 

As used herein, "recombinant DNA" is a DNA sequence or segment that has been 
10 isolated from a cell, purified, synthesized or amplified. 

As used herein, "isolated" means either physically isolated from the cell or 
synthesized in vitro on the basis of the sequence of an isolated DNA segment. 

As used herein, the term "increased" or "elevated" levels of the preselected amino 
acid in a protein means that the protein contains an elevated amount of a preselected amino 
15 acid compared to the amount in an average protein. 

As used herein, "increased" or "elevated" levels or amounts of preselected amino 
acids in a transformed plant or seed are levels which are greater than the levels or amounts 
in the corresponding untransformed plant or seed. 

As used herein, "polypeptide" means proteins, protein fragments, modified 
20 proteins, amino acid sequences and synthetic amino acid sequences. 

As used herein, "transformed plant" means a plant which comprises a structural 
gene which is introduced into the genome of the plant by transformation. 

As used herein, "untransformed plant" refers to a wild type plant, i.e., one where 
the genome has not been altered by the introduction of the structural gene. 
25 As used herein, "plant" includes but is not limited to plant cells, plant tissue and 

plant seeds. 

As used herein, "seed endosperm-preferred promoter" is a promoter which 
preferentially promotes expression of the structural gene in the endosperm of the seed. 

As used herein with respect to a structural gene encoding a polypeptide, the term 
30 "expresses" means that the structural gene is incorporated into the genome of cells, so that 
the product encoded by the structural gene is produced within the cells. 
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As used herein, the term "essential amino acid" means an amino acid which is 
synthesized only by plants or microorganisms or which is not produced by animals in 
sufficient quantities to support normal growth and development. 

As used herein, the term "high lysine content protein" means that the protein has at 
5 least about 7 mole % lysine, preferably about 7 mole % to about 50 mole % lysine, more 
preferably about 7 mole % to about 40 mole % lysine and most preferably about 7 mole % 
to about 30 mole %. 

As used herein, the term "high sulfur content protein" means that the protein 
contains at least about 6 mole % methionine and/or cysteine, preferably about 6 mole % to 
10 about 40 mole %, more preferably about 6 mole % to about 30 mole % and most 
preferably 6 mole % to 25 mole %. 

The present invention provides a transformed plant seed, the endosperm of which is 
characterized as having an elevated level of a preselected amino acid compared to the seed 
of a corresponding plant which has not been transformed. It is preferred that the level of 
15 preselected amino acid is elevated in the endosperm in preference to other parts of the 
seed. 

The preselected amino acid is an essential amino acid such as lysine, cysteine, 

methionine, threonine, tryptophan, arginine, valine, leucine, isoleucine, histidine or 

combinations thereof, preferably, the preselected amino acid is lysine, threonine, cysteine, 
20 tryptophan, or combinations thereof and optionally methionine. It is especially preferred 

that the polypeptide has an increased content of lysine as well as a sulfur containing amino 

acid, i.e., methionine and/or cysteine. 

The polypeptide can be an endogenous or heterologous protein. When an 

endogenous protein is expressed, the preselected amino acid is lysine, cysteine, threonine, 
25 tryptophan, arginine, valine, leucine, isoleucine, histidine or combinations thereof and 

optionally methionine. When the protein is a heterologous protein, any of the above 

described preselected amino acids or combinations thereof is present in elevated amounts. 
Generally the amount of preselected amino acid in the seed of the present invention 

is at least about 10 percent by weight greater than in a corresponding untransformed seed, 
30 preferably about 10 percent by weight to about 10 times greater, more preferably about 15 

percent by weight to about 10 time greater and most preferably about 20 percent to about 

1 0 times greater. 



BNSDOCID: <WO 9940209A1_1_> 



WO 99/40209 PCT/US99/02061 

6 

A polypeptide having an elevated amount of the preselected amino acid is 
expressed in the transformed plant seed endosperm in an amount sufficient to increase the 
amount of at least one preselected amino acid in the seed of the transformed plant, relative 
to the amount of the preselected amino acid in the seed of a corresponding untransformed 
5 plant. 

The choice of the structural gene is based on the desired amino acid composition of 
the polypeptide encoded by the structural gene, and the ability of the polypeptide to 
accumulate in seeds. The amino acid composition of the polypeptide can be manipulated 
by methods, such as site-directed mutagenesis of the structural gene encoding the 
10 polypeptide, so as to result in expression of a polypeptide that is increased in the amount of 
a particular amino acid. For example, site-directed mutagenesis can be used to increase 
levels of lysine, methionine, cysteine, threonine and/or tryptophan and/or to decrease 
levels of asparagine and/or giutamine. 

The derivatives differ from the wild-type protein by one or more amino acid 
15 substitutions, insertions, deletions or the like. Typically, amino acid substitutions are 

conservative. In the regions of homology to the native sequence, variants preferably have 
at least 90% amino acid sequence identity, more preferably at least 95% identity. 

Typical examples of suitable proteins include barley chymotrypsin inhibitor, barley 
alpha hordothionin, soybean 2S albumin proteins, rice high methionine protein and 
20 sunflower high methionine protein and derivatives of each protein. 

Barley alpha hordothionin has been modified to increase the level of particular 
amino acids. The sequences of genes which express modified alpha hordothionin proteins 
with enhanced essential amino acids are based on the mRNA sequence of the native 
Hordeum vulgare alpha hordothionin gene (accession number X05901, Ponz et al 1986 
25 Eur. J. Biochem. 156:131-135). 

Modified hordothionin proteins are described in U.S. Ser. Nos. 08/838,763 filed 
April 10, 1997; 08/824,379 filed March 26, 1997; 08/824,382 filed March 26, 1997; and 
U.S. Pat. No. 5,703,409 issued December 30, 1997 the disclosures of which are 
incorporated herein in their entirety by reference. 
30 Alpha hordothionin is a 45-amino acid protein which is stabilized by four disulfide 

bonds resulting from eight cysteine residues. In its native form, the protein is especially 
rich in arginine and lysine residues, containing 5 residues (10%) of each. However, it is 
devoid of the essential amino acid methionine. 



iNSDOClD: <WO 9940209M_l_> 



WO 99/40209 PCTAJS99/02061 

7 

Alpha hordothionin has been modified to increase the amount of various amino 
acids such as lysine, threonine or methionine. The protein has been synthesized and the 
three-dimensional structure determined by computer modeling. The modeling of the 
protein predicts that the ten charged residues (arginine at positions 5, 10, 17, 19 and 30, 
5 and lysine at positions 1, 23, 32, 38 and 45) all occur on the surface of the molecule. The 
side chains of the polar amino acids (asparagine at position 1 1, glutamine at position 22 
and threonine at position 41) also occur on the surface of the molecule. Furthermore, the 
hydrophobic amino acids, (such as the side chains of leucine at positions 8, 15, 24 and 33 
and valine at position 1 8) are also solvent- accessible. 

10 The Three-dimensional modeling of the protein indicates that the arginine residue 

at position 10 is important to retention of the appropriate 3 -dimensional structure and 
possible folding through hydrogen bond interactions with the C-terminal residue of the 
protein. A lysine, methionine or threonine substitution at that point would disrupt this 
hydrogen bonding network, leading to a destabilization of the structure. The synthetic 

15 peptide having this substitution could not be made to fold correctly, which supported this 
analysis. Conservation of the arginine residue at position 10 provides a protein which 
folds correctly. 

Alpha hordothionin has been modified to contain 12 lysine residues in the mature 
hordothionin peptide, referred to as HT12. (Rao et al 1994 Protein Engineering 

20 7(12):1485-1493 and WO 94/16078 published July 21, 1994) The disclosure of each of 
these is incorporated herein by reference in their entirety. 

Further analysis of substitutions which would not alter the 3-dimensional structure of 
the molecule led to replacement of Asparagine- 1 1, Glutamine-22 and Threonine-41 with 
lysine residues with virtually no steric hindrance. The resulting compound contains 27% 

25 lysine residues. 

Other combinations of these substitutions were also made, including changes in 
amino acid residues at one or more of positions 5, 1 1, 17, 19, 22, 30 and 41 are lysine, and 
the remainder of the residues at those positions are the residues at the corresponding 
positions in the wild type hordothionin. 

30 Since threonine is a polar amino acid, the surface polar amino acid residues, 

asparagine at position 1 1 and glutamine at position 22, can be substituted; and the charged 
amino acids, lysine at positions 1, 23, 32 and 38 and arginine at positions 5, 17, 19, and 30, 
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can also be substituted with threonine. The molecule can be synthesized by solid phase 
peptide synthesis. 

While the above sequence is illustrative of the present invention, it is not intended 
to be a limitation. Threonine substitutions can also be performed at positions containing 

5 charged amino acids. Only arginine at position 1 0 and lysine at position 45 are important 
for maintaining the structure of the protein. One can also substitute at the sites having 
hydrophobic amino acids. These include positions 8, 15, 18 and 24. 

Since methionine is a hydrophobic amino acid, the surface hydrophobic amino acid 
residues, leucine at positions 8, 15, and 33, and valine at position 18, were substituted with 

10 methionine. The surface polar amino acids, asparagine at position 11, glutamine at 

position 22 and threonine at position 41 , are substituted with methionine. The molecule is 
synthesized by solid phase peptide synthesis and folds into a stable structure. It has seven 
methionine residues (15.5%) and, including the eight cysteines, the modified protein has a 
sulfur amino acid content of 33%. 

15 While the above-described proteins are illustrative of suitable polypeptides which 

can be expressed in the transformed plant, it is not intended to be a limitation. Methionine 
substitutions can also be performed at positions containing charged amino acids. Only 
arginine at position 10 is important for maintaining the structure of the protein through a 
hydrogen-bonding network with serine at position 2 and lysine at position 45. Thus, one 

20 can substitute methionine for lysine at positions 1, 23, 32, and/or 38, and for arginine at 
positions 5, 17, 19 and/or 30. 

Many other proteins are also appropriate, for example the protein encoded by the 
structural gene can be a lysine and/or sulfur rich seed protein, such as the soybean 2S 
albumin described in U.S. Ser. No. 08/618,91 1 filed March 20, 1996, and the 

25 chymotrypsin inhibitor from barley, Williamson et aL, Eur. J Biochem 165: 99-106 (1987), 
the disclosures of each are incorporated by reference. 

Derivatives of these genes can be made by site directed mutagenesis to increase the 
level of preselected amino acids in the encoded polypeptide. For example the gene 
encoding for the barley high lysine polypeptide (BHL), is derived from barley 

30 chymotrypsin inhibitor, U.S. Ser. No. 08/740,682 filed November 1 , 1996 and 

PCT/US97/20441 filed October 31, 1997, the disclosures of each are incorporated herein 
by reference. The gene encoding for the enhanced soybean albumin gene (ESA) , is 
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derived from soybean 2S albumin described in U.S. Ser. No. 08/618,91 1, the disclosure of 
which is incorporated herein in its entirety by reference. 

Other examples of sulfur-rich plant proteins within the scope of the invention 
include plant proteins enriched in cysteine but not methionine, such as the wheat 
5 endosperm purothionine (Mak and Jones; Can. J. Biochem. : Vol. 22; p. 83 J; (1 976); 
incorporated herein in its entirety by reference), the pea low molecular weight albumins 
(Higgins, et al\ J. Biol. Chem. : Vol. 261; p. 1 1 124; (1986); incorporated herein in its 
entirety by reference) as well as 2S albumin genes from other organisms. See, for 
example, Coulter, et al\ J. Exp. Bot. ; Vol. 41; p. 1541; (1990); incorporated herein in its 
10 entirety by reference. 

Such proteins also include methionine-rich plant proteins such as from sunflower 
seed (Lilley, et al. ; In: Proceedings of the World Congress on Vegetable Protein 
Utilization in Human Foods and Animal Feedstuffs : Applewhite, H. (ed.); American Oil 
Chemists Soc; Champaign, IL; pp. 497-502; (1989); incorporated herein in its entirety by 
15 reference), corn (Pedersen, et al. ; J. Biol. Chem. p. 261 ; p. 6279; (1 986); Kirihara, et al. ; 
Gene . Vol. 71; p. 359; (1988); both incorporated herein in its entirety by reference), and 
rice (Musumura, et al\ Plant Mol. Biol. ; Vol. 12; p. 123; (1989); incorporated herein in its 
entirety by reference). 

The present invention also provides a method for genetically modifying plants to 
20 increase the level of at least one preselected amino acid in the endosperm of the seed so as 
to enhance the nutritional value of the seeds. 

The method comprises the introduction of an expression cassette into regenerable 
plant cells to yield transformed plant cells. The expression cassette comprises a seed 
endosperm-preferred promoter operably linked to a structural gene encoding a polypeptide 
25 elevated in content of the preselected amino acid. 

A fertile transformed plant is regenerated from the transformed cells, and seeds are 
isolated from the plant. The structural gene is transmitted through a complete normal 
sexual cycle of the transformed plant to the next generation. 

The polypeptide is synthesized in the endosperm of seed of the plant which has 
30 been transformed by insertion of the expression cassette described above. The sequence 
for the nucleotide molecule, either RNA or DNA, can readily be derived from the amino 
acid sequence for the selected polypeptide using standard reference texts. 
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Plants which can be used in the method of the invention include monocotyledonous 
cereal plants. Preferred plants include maize, wheat, rice, barley, oats, sorghum, millet and 
rye. The most preferred plant is maize. 

Seeds derived from plants regenerated from transformed plant cells, plant parts or 
5 plant tissues, or progeny derived from the regenerated transformed plants, may be used 
directly as feed or food, or further processing may occur. 

Transformation 

The transformation of plants in accordance with the invention may be carried out in 
10 essentially any of the various ways known to those skilled in the art of plant molecular 
biology. These include, but are not limited to, microprojectile bombardment, 
microinjection, electroporation of protoplasts or cells comprising partial cell walls, and 
Agrobacterium-mediated DNA transfer. 

15 I. DNA Used for Transformation 

DNA useful for introduction into plant cells includes DNA that has been derived or 
isolated from any source, that may be subsequently characterized as to structure, size 
and/or function, chemically altered, and later introduced into the plant. 

An example of DNA "derived" from a source, would be a DNA sequence or 

20 segment that is identified as a useful fragment within a given organism, and which is then 
synthesized in essentially pure form. An example of such DNA "isolated" from a source 
would be a useful DNA sequence that is excised or removed from the source by chemical 
means, e.g., by the use of restriction endonucleases, so that it can be further manipulated, 
e.g., amplified, for use in the invention, by the methodology of genetic engineering. 

25 Therefore, useful DNA includes completely synthetic DNA, semi-synthetic DNA, 

DNA isolated from biological sources, and DNA derived from RNA. The DNA isolated 
from biological sources, or DNA derived from RNA, includes, but is not limited to, DNA 
or RNA from plant genes, and non-plant genes such as those from bacteria, yeasts, animals 
or viruses. The DNA or RNA can include modified genes, portions of genes, or chimeric 

30 genes, including genes from the same or different genotype. 

The term "chimeric gene" or "chimeric DNA" is defined as a gene or DNA 
sequence or segment comprising at least two DNA sequences or segments from species 
which do not recombine DNA under natural conditions, or which DNA sequences or 
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segments are positioned or linked in a manner which does not normally occur in the native 
genome of untransformed plant. 

A structural gene of the invention can be identified by standard methods, e.g., 
enrichment protocols, or probes, directed to the isolation of particular nucleotide or amino 
5 acid sequences. The structural gene can be identified by obtaining and/or screening of a 
DNA or cDNA library generated from nucleic acid derived from a particular cell type, cell 
line, primary cells, or tissue. 

Screening for DNA fragments that encode all or a portion of the structural gene can 
be accomplished by screening plaques from a genomic or cDNA library for hybridization 
10 to a probe of the structural gene from other organisms or by screening plaques from a 
cDNA expression library for binding to antibodies that specifically recognize the 
polypeptide encoded by the structural gene. 

DNA fragments that hybridize to a structural gene probe from other organisms 
and/or plaques carrying DNA fragments that are immunoreactive with antibodies to the 
15 polypeptide encoded by the structural gene can be subcloned into a vector and sequenced 
and/or used as probes to identify other cDNA or genomic sequences encoding all or a 
portion of the structural gene. 

Portions of the genomic copy or copies of the structural gene can be partially 
sequenced and identified by standard methods including either DNA sequence homology 
20 to other homologous genes or by comparison of encoded amino acid sequences to known 
polypeptide sequences. 

Once portions of the structural gene are identified, complete copies of the structural 
gene can be obtained by standard methods, including cloning or polymerase chain reaction 
(PCR) synthesis using oligonucleotide primers complementary to the structural gene. The 
25 presence of an isolated full-length copy of the structural gene can be verified by 

comparison of its deduced amino acid sequence with the amino acid sequence of native 
polypeptide sequences. 

As discussed above, the structural gene encoding the polypeptide can be modified 
to increase the content of particular amino acid residues in that polypeptide by methods 
30 well known to the art, including, but not limited to, site-directed mutagenesis. Thus, 

derivatives of naturally occurring polypeptides can be made by nucleotide substitution of 
the structural gene so as to result in a polypeptide having a different amino acid at the 
position in the polypeptide which corresponds to the codon with the nucleotide 
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substitution. The introduction of multiple amino acid changes in a polypeptide can result 
in a polypeptide which is significantly enriched in a preselected amino acid. 

As noted above, the choice of the polypeptide encoded by the structural gene will 
be based on the amino acid composition of the polypeptide and its ability to accumulate in 
5 seeds. The amino acid can be chosen for its nutritional value to produce a value-added 
trait to the plant or plant part. Amino acids desirable for value-added traits, as well as a 
source to limit synthesis of an endogenous protein include, but are not limited to, lysine, 
threonine, tryptophan, methionine, and cysteine. 

10 Expression Cassettes and Expression Vectors 

According to the present invention, a structural gene is identified, isolated, and 
combined with a seed endosperm-preferred promoter to provide a recombinant expression 
cassette. 

The construction of such expression cassettes which can be employed in 
15 conjunction with the present invention are well known to those of skill in the art in light of 
the present disclosure. See, e.g., Sambrook, et al ; Molecular Cloning: A Laboratory 
Manual : Cold Spring Harbor, New York; (1989); Gelvin, et al; Plant Molecular Biology 
Manual : (1990); Plant Biotechnology: Commercial Prospects and Problems , eds Prakash, 
et al; Oxford & IBH Publishing Co.; New Delhi, India; (1993); and Heslot, et al; 
20 Molecular Biology and Genetic Engineering of Yeasts : CRC Press, Inc., USA; (1992); 
each incorporated herein in its entirety by reference. 

Preferred promoters useful in the practice of the invention are those seed 
endosperm-preferred promoters that allow expression of the structural gene selectively in 
seed endosperm to avoid any potential deleterious effects associated with the expression of 
25 the structural gene in the embryo. 

It has been found that when endosperm-preferred promoters are employed, the total 
level of the preselected amino acid in the seed is increased compared to a seed produced by 
employing an embryo-preferred promoter, such as the globulinl promoter. When the 
globulin 1 promoter is employed, the polypeptide is expressed by the structural gene, but 
30 the total amount of the preselected amino acid is not increased. 

Examples of suitable promoters include, but are not limited to, 27 kD gamma zein 
promoter and waxy promoter. See the following sites relating to the 27kD gamma zein 
promoter: Boronat,A., Martinez,M.C. ? Reina,M., Puigdomenech,P. and Palau,J.; Isolation 
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and sequencing of a 28 kD glutelin-2 gene from maize: Common elements in the 5' 
flanking regions among zein and glutelin genes; Plant Sci. 47, 95-102 (1986) and 
Reina,M., PonteJ., Guillen,P., Boronat,A. and Palau,J., Sequence analysis of a genomic 
clone encoding a Zc2 protein from Zea mays W64 A, Nucleic Acids Res. 1 8 (21), 6426 
5 (1990). See the following site relating to the waxy promoter: Kloesgen,R.B., Gierl,A., 
Schwarz-Sommer,ZS. and Saedler,H., Molecular analysis of the waxy locus of Zea mays, 
Mol. Gen. Genet. 203, 237-244 (1986). The disclosures each of these are incorporated 
herein by reference in their entirety. 

However, other endosperm-preferred promoters can be employed. 

10 

II. DELIVERY OF DNA TO CELLS 

The expression cassette or vector can be introduced into prokaryotic or eukaryotic 
cells by currently available methods which are described in the literature. See for example, 
Weising et al. 9 Ann. Rev. Genet. 2: 421-477 (1988). For example, the expression cassette 

15 or vector can be introduced into plant cells by methods including, but not limited to, 

Agrobacterium-mediatcd transformation, electroporation, PEG poration, microprojectile 
bombardment, microinjection of plant cell protoplasts or embryogenic callus, silicon fiber 
delivery, infectious viruses or viroids such as retroviruses, the use of liposomes and the 
like, all in accordance with well-known procedures. 

20 The introduction of DNA constructs using polyethylene glycol precipitation is 

described in Paszkowski et aL, Embo J. 3: 2717-2722 (1984). Electroporation techniques 
are described in Fromm et al, Proc. Natl. Acad. Sci. 82: 5324 (1985). Ballistic 
transformation techniques are described in Klein et ai, Nature 327: 70-73 (1987). The 
disclosure of each of these is incorporated herein in its entirety by reference. 

25 Introduction and expression of foreign genes in plants has been shown to be 

possible using the T-DNA of the tumor-inducing (Ti) plasmid of Agrobacterium 
tumefaciens. Using recombinant DNA techniques and bacterial genetics, a wide variety of 
foreign DNAs can be inserted into T-DNA in Agrobacterium. Following infection by the 
bacterium containing the recombinant Ti plasmid, the foreign DNA is inserted into the host 

30 plant chromosomes, thus producing a genetically engineered cell and eventually a 
genetically engineered plant. A second approach is to introduce root-inducing (Ri) 
plasmids as the gene vectors. 
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Agrobacterium tumefaciens-mediated transformation techniques are well described 
in the literature. See, for example Horsch et aL, Science 233: 496-498 (1984), and Fraley 
et aL, Proc. Natl. Acad. Sci . 80: 4803 (1983). Agrobacterium transformation of maize is 
described in U.S. Patent No. 5,550,318. The disclosure of each of these is incorporated 

5 herein in its entirety by reference. 

Other methods of transfection or transformation include (I) Agrobacterium 
rhizogenes-mzdizted transformation (see, e.g., Lichtenstein and Fuller In: Genetic 
Engineering , vol. 6, PWJ Rigby, Ed., London, Academic Press, 1987; and Lichtenstein, C. 
P., and Draper, J,. In: DNA Cloning , Vol. II, D. M. Glover, Ed., Oxford, IRI Press, 1985). 

10 Application PCT/US87/02512 (WO 88/02405 published Apr. 7, 1988) describes the use of 
A.rhizogenes strain A4 and its Ri plasmid along with A. tumefaciens vectors pARC8 or 
pARC16 (2) liposome-mediated DNA uptake (see, e.g., Freeman et a!., Plant Cell Physiol. 
25: 1353, 1984), (3) the vortexing method (see, e.g., Kindle, Proc. Natl. Acad Sci .. USA 
87: 1228, (1990). The disclosure of each of these is incorporated herein in its entirety by 

15 reference. 

DNA can also be introduced into plants by direct DNA transfer into pollen as 

described by Zhou et aL, Methods in Enzvmology . 101 :433 (1983); D. Hess, Intern Rev. 

Cvtol ., 107:367 (1987); Luo et aL, Plane Mol. Biol. Reporter , 6:165 (1988). The 

disclosure of each of these is incorporated herein in its entirety by reference. 
20 Expression of polypeptide coding genes can be obtained by injection of the DNA 

into reproductive organs of a plant as described by Pena et al, Nature , 325.:274 (1987). 

The disclosure of which is incorporated herein in its entirety by reference. 

DNA can also be injected directly into the cells of immature embryos and the 

rehydration of desiccated embryos as described by Neuhaus et aL, Theor. AppI. Genet ., 
25 75:30 (1987); and Benbrook et aL, in Proceedings Bio Expo 1986, Butterworth, Stoneham, 

Mass., pp. 27-54 (1986). The disclosure of each of these is incorporated herein in its 

entirety by reference. 

Plant cells useful for transformation include cells cultured in suspension cultures, 

callus, embryos, meristem tissue, pollen, and the like. 
30 A variety of plant viruses that can be employed as vectors are known in the art and 

include cauliflower mosaic virus (CaMV), geminivirus, brome mosaic virus, and tobacco 

mosaic virus. 
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Typical vectors useful for expression of genes in higher plants are well known in 
the art and include vectors derived from the tumor-inducing (Ti) plasmid of Agrobacterium 
tumefaciens described by Rogers et ai, Meth. In EnzvmoU 153:253-277 (1987). These 
vectors are plant integrating vectors in that on transformation, the vectors integrate a 
5 portion of vector DNA into the genome of the host plant. The disclosure of which is 
incorporated herein in its entirety by reference. 

A particularly preferred vector is a plasmid, by which is meant a circular double- 
stranded DNA molecule which is not a part of the chromosomes of the cell. Exemplary A. 
tumefaciens vectors useful herein are plasmids pKYLX6 and pKYLX7 of Schardl et al, 
10 Gene . 61:1-1 1 (1987) and Berger et aL 9 Proc. Natl. Acad. Sci. U.S.A., 86:8402-8406 

(1989). Another useful vector herein is plasmid pBIlOl .2 that is available from Clontech 
Laboratories, Inc. (Palo Alto, CA). The disclosure of each of these is incorporated herein 
in its entirety by reference. 

A cell in which the foreign genetic material in a vector is functionally expressed 
15 has been "transformed" by the vector and is referred to as a "transformant". 

Either genomic DNA or cDNA coding the gene of interest may be used in this 
invention. The gene of interest may also be constructed partially from a cDNA clone and 
partially from a genomic clone. 

When the gene of interest has been isolated, genetic constructs are made which 
20 contain the necessary regulatory sequences to provide for efficient expression of the gene 
in the host cell. 

According to this invention, the genetic construct will contain (a) a genetic 
sequence coding for the polypeptide of interest and (b) one or more regulatory sequences 
operably linked on either side of the structural gene of interest. Typically, the regulatory 
25 sequences will be a promoter or a terminator. The regulatory sequences may be from 
autologous or heterologous sources. 

The cloning vector will typically carry a replication origin, as well as specific 
genes that are capable of providing phenotypic selection markers in transformed host cells. 
Typically, genes conferring resistance to antibiotics or selected herbicides are used. After 
30 the genetic material is introduced into the target cells, successfully transformed cells 
and/or colonies of cells can be isolated by selection on the basis of these markers. 

Typical selectable markers include genes coding for resistance to the antibiotic 
spectinomycin (e.g., the aada gene), the streptomycin phosphotransferase (SPT) gene 
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coding for streptomycin resistance, the neomycin phosphotransferase (NPTII) gene 
encoding kanamycin or geneticin resistance, the hygromycin phosphotransferase (HPT) 
gene coding for hygromycin resistance. 

Genes coding for resistance to herbicides include genes which act to inhibit the 
5 action of acetolactate synthase (ALS), in particular the sulfonylurea-type herbicides (e.g., 
the acetolactate synthase (ALS) genes containing mutations leading to such resistance in 
particular the S4 and/or Hra mutations), genes coding for resistance to herbicides which act 
to inhibit action of glutamine synthase, such as phosphinothricin or basta (e.g., the pat or 
bar gene), or other such genes known in the art. The bar gene encodes resistance to the 

10 herbicide basta, and the ALS gene encodes resistance to the herbicide chlorsulfuron. 

Typically, an intermediate host cell will be used in the practice of this invention to 
increase the copy-number of the cloning vector. With an increased copy number, the 
vector containing the gene of interest can be isolated in significant quantities for 
introduction into the desired plant cells. 

15 Host cells that can be used in the practice of this invention include prokaryotes, 

including bacterial hosts such as E. coli, S. typhimurium^ and Serratia marcescens. 
Eukaryotic hosts such as yeast or filamentous fungi may also be used in this invention. 
Since these hosts are also microorganisms, it will be essential to ensure that plant 
promoters which do not cause expression of the polypeptide in bacteria are used in the 

20 vector. 

The isolated cloning vector will then be introduced into the plant cell using any 
convenient transformation technique as described above. 

III. Regeneration and Analysis of Transformants 

25 Following transformation, regeneration is involved to obtain a whole plant from 

transformed cells and the presence of structural gene (s) or "transgene(s)" in the 
regenerated plant is detected by assays. The seed derived from the plant is then tested for 
levels of preselected amino acids. Depending on the type of plant and the level of gene 
expression, introduction of the structural gene into the plant seed endosperm can enhance 

30 the level of preselected amino acids in an amount useful to supplement the nutritional 
quality of those seeds. 
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Using known techniques, protoplasts and cell or tissue culture can be regenerated to 
form whole fertile plants which carry and express the gene for a polypeptide according to 
this invention. 

Accordingly, a highly preferred embodiment of the present invention is a 

5 transformed maize plant, the cells of which contain at least one copy of the DNA sequence 
of an expression cassette containing a gene encoding a polypeptide containing elevated 
amounts of an essential amino acid, such an HT12, BHL or ESA protein. 

Techniques for regenerating plants from tissue culture, such as transformed 
protoplasts or callus cell lines, are known in the art. For example, see Phillips, et al. ; Plant 

10 Cell Tissue Organ Culture : Vol. 1; p. 123; (1981); Patterson, et al\ Plant Sci. ; Vol. 42; p. 
125; (1985); Wright, et al\ Plant Cell Reports ; Vol. 6; p. 83; (1987); and Barwale, et al ; 
Planta : Vol. 167; p. 473; (1986); each incorporated herein in its entirety by reference. The 
selection of an appropriate method is within the skill of the art. 

Examples of the practice of the present invention detailed herein relate specifically 

15 to maize plants. However, the present invention is also applicable to other cereal plants. 
The expression vectors utilized herein are demonstrably capable of operation in cells of 
cereal plants both in tissue culture and in whole plants. The invention disclosed herein is 
thus operable in monocotyledonous species to transform individual plant cells and to 
achieve full, intact plants which can be regenerated from transformed plant cells and which 

20 express preselected polypeptides. 

The introduced structural genes are expressed in the transformed plant cells and 
stably transmitted (somatically and sexually) to the next generation of cells produced. The 
vector should be capable of introducing, maintaining, and expressing a structural gene in 
plant cells. The structural gene is passed on to progeny by normal sexual transmission. 

25 To confirm the presence of the structural gene (s) or "transgene(s)" in the 

regenerating plants, or seeds or progeny derived from the regenerated plant, a variety of 
assays can be performed. Such assays include Southern and Northern blotting; PCR; 
assays that detect the presence of a polypeptide product, e.g., by immunological means 
(ELISAs and Western blots) or by enzymatic function; plant part assays, such as leaf, seed 

30 or root assays; and also, by analyzing the phenotype of the whole regenerated plant. 

Whereas DNA analysis techniques can be conducted using DNA isolated from any 
part of a plant, RNA will be expressed in the seed endosperm and hence it will be 
necessary to prepare RNA for analysis from these tissues. 
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PCR techniques can be used for detection and quantitation of RN A produced from 
introduced structural genes. In this application of PCR it is first necessary to reverse 
transcribe RNA into DNA, using enzymes such as reverse transcriptase, and then through 
the use of conventional PCR techniques amplify the DNA. In most instances PCR 
5 techniques, while useful, will not demonstrate integrity of the RNA product. 

Further information about the nature of the RNA product may be obtained by 
Northern blotting. This technique will demonstrate the presence of an RNA species and 
give information about the integrity of that RNA. The presence or absence of an RNA 
species can also be determined using dot or slot blot Northern hybridizations. These 
10 techniques are modifications of Northern blotting and will only demonstrate the presence 
or absence of an RNA species. 

While Southern blotting and PCR may be used to detect the structural gene in 
question, they do not provide information as to whether the structural gene is being 
expressed. Expression may be evaluated by specifically identifying the polypeptide 
15 products of the introduced structural genes or evaluating the phenotypic changes brought 
about by their expression. 

Assays for the production and identification of specific polypeptides may make use 
of physical-chemical, structural, functional, or other properties of the polypeptides. 
Unique physical-chemical or structural properties allow the polypeptides to be separated 
20 and identified by electrophoretic procedures, such as native or denaturing gel 

electrophoresis or isoelectric focusing, or by chromatographic techniques such as ion 
exchange or gel exclusion chromatography. 

The unique structures of individual polypeptides offer opportunities for use of 
specific antibodies to detect their presence in formats such as an ELISA assay. 
25 Combinations of approaches may be employed with even greater specificity such as 

Western blotting in which antibodies are used to locate individual gene products that have 
been separated by electrophoretic techniques. 

Additional techniques may be employed to absolutely confirm the identity of the 
product of interest such as evaluation by amino acid sequencing following purification. 
30 Although these are among the most commonly employed, other procedures may be 
additionally used. 

Very frequently, the expression of a gene product is determined by evaluating the 
phenotypic results of its expression. These assays also may take many forms, including 
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but not limited to, analyzing changes in the chemical composition, morphology, or 
physiological properties of the plant. In particular, the elevated preselected amino acid 
content due to the expression of structural genes encoding polypeptides can be detected by 
amino acid analysis. 

5 Breeding techniques useful in the present invention are well known in the art. 

The present invention will be further described by reference to the following 
detailed examples. It is understood, however, that there are many extensions, variations, 
and modifications on the basic theme of the present invention beyond that shown in the 
examples and description, which are within the spirit and scope of the present invention. 

10 

Examples 

EXAMPLE 1 

Construction of the HT12 gene and of other genes encoding polypeptides having an 
elevated level of a preselected amino acid. 

15 

As noted above, the sequence of the HT12 gene is based on the mRN A sequence of the 
native Hordeum vulgare alpha hordothionin gene (accession number X05901, Ponz et ah 
1986 Eur. J. Biochem. 156:131-135) modified to introduce 12 lysine residues into the 
mature hordothionin peptide (See Rao et ah 1994 Protein Engineering 7(12):1485-1493, 
20 and WO 94/16078 published July 21, 1994). 

The alpha hordothionin cDNA comprising the entire alpha hordothionin coding 
sequence is isolated by rt-PCR of mRNA from developing barley seed. Primers are 
designed based upon the published alpha hordothionin sequence to amplify the gene and to 
introduce a Ncol site at the start (ATG) codon and a BamHI site after the stop codon of the 
25 thionin coding sequence to facilitate cloning. 

Primers are designated as HTPCR1 (5'- 
AGTATAAGTAAACACACCATCACACCCTTGAGGCCCTTGCTGGTGGCCATGGT 
G-3')and HTPCR2 (5'- 

CCTCACATCCCTTAGTGCCTAAGTTCGACGTCGGGCCCTCTAGTCGACGGATCC 
30 A-3'). These primers are used in a PCR reaction to amplify alpha hordothionin by 
conventional methods. The resulting PCR product is purified and subcloned into the 
BamHI/Ncol digested pBSKP vector (Stratagene, LaJolla, CA) and sequenced on both 
strands to confirm its identity. The clone is designated pBSKP-HT (seq. ID 1). Primers 
are designed for single stranded DNA site-directed mutagenesis to introduce 12 codons for 
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lysine, based on the peptide structure of hordothionin 12 (Ref: Rao et al 1994 Protein 
Engineering 7(12):1485-1493) and are designated HT12mutl (5'- 
AGCGGAAAATGCCCGAAAGGCTTCCCCAAATTGGC-3 HT12mut2 (5'- 
TGCGCAGGCGTCTGCAAGTGTAAGCTGACTAGTAGCGGAAAATGC-3'), 
5 HT12mut3 (5'- 

TACAACCTTTGCAAAGTCAAAGGCGCCAAGAAGCTTTGCGCAGGCGTCTG-3'), 
HT12mut4(5'- 

GCAAGAGTTGCTGCAAGAGTACCCTGGGAAGGAAGTGCTACAACCTTTGC-3'). 
Sequence analysis is used to verify the desired sequence of the resulting plasmid, 

10 designated pBSKP-HTl 2 (seq. ID 2). 

Similarly, genes encoding other derivatives of hordothionine, as described above, 
(See U.S. Ser. Nos. 08/838,763 filed April 10, 1997; 08/824,379 filed March 26, 1997; 
08/824,382 filed March 26, 1997; and U.S. Pat. No. 5,703,409 issued December 30, 1997), 
the gene encoding enhanced soybean albumin (ESA) (See U.S. Ser. No. 08/618,91 1), and 

15 genes encoding BHL and other derivatives of the barley chymotrypsin inhibitor (See U.S. 
Ser. No. 08/740,682 filed November 1, 1996 and PCT/US97/20441 filed October 31, 
1997) are constructed by site directed mutagenesis from pBSKP-HT, a subclone of the 
soybean 2S albumin 3 gene in the pBSKP vector (Stratagene, LaJolla, CA), and a subclone 
of the barley chymotrypsin inhibitor in the pBSKP vector, respectively. 

20 

EXAMPLE 2 

Construction of vectors for seed preferred expression of polypeptides having an elevated 
level of a preselected amino acid. 

25 A 442bp DNA fragment containing the modified hordothionin gene encoding 

HT12 is isolated from plasmid pBSKP-HT12 by NcoI/BamHI restriction digestion, gel 
purification and is ligated between the 27 kD gamma zein promoter and 27kD gamma zein 
terminator of the NcoI/BamHI digested vector PHP3630. PHP 3630 is a subclone of the 
endosperm-preferred 27kD gamma zein gene (Genbank accession number X58197) in the 

30 pBSKP vector (Stratagene), which is modified by site directed mutagenesis by insertion of 
a Ncol site at the start codon (ATG) of the 27kD gamma zein coding sequence. The 27kD 
gamma zein coding sequence is replaced with the HT12 coding sequence. The resulting 
expression vector containing the chimeric gene construct gz::HT12::gz 3 designated as 



5NSDOCID: <WO 9940209A1_I_> 



WO 99/40209 PCT/US99/02061 

21 

PHP8001 (Seq. ID 3),is verified by extensive restriction digest analysis and DNA 
sequencing. 

Similarly, the 442bp DNA fragment containing the HT12 coding sequence is 
inserted between the globulin 1 promoter and the globulin 1 terminator of the embryo 
5 preferred corn globulinl gene (Genbank accession number X59083), and between the 
waxy promoter and the waxy terminator of the endosperm-preferred waxy gene (Genbank 
accession number M24258). The globulinl and waxy coding sequences, respectively, are 
replaced with the HT12 coding sequence. The resulting chimeric genes glbl ::HT12::glbl, 
and wx::HT12::wx are designated as PHP 7999 (Seq. ID 4), and PHP 5025 (Seq. ID 5). 

In a like manner, expression vectors containing genes encoding other derivatives of 
hordothionine (See Rao et al 1994 Protein Engineering 7(12):1485-1493, and WO 
94/16078 published July 21, 1994), the gene encoding enhanced soybean albumin (ESA) 
(See U.S. Ser. No. 08/618,91 1,), and genes encoding BHL and other derivatives of the 
barley chymotrypsin inhibitor (See U.S. Ser. No. 08/740,682 filed November 1, 1996 and 
PCT/US97/20441 filed October 31, 1997) are constructed by insertion of the 
corresponding coding sequences between the promoter and terminator of the 27kD gamma 
zein gene, the globulinl gene and the waxy gene, respectively. Resulting chimeric genes 
are for example gz::ESA::gz and gz::BHL::gz, designated as PHP1 1260 (Seq. ID 6) and as 
PHP1 1427 (Seq. ID 7), respectively. 

The resulting expression vectors are used in conjunction with the selectable marker 
expression cassettes PHP3528 (enhanced CAMV::Bar::PinII) for particle bombardment 
transformation of maize immature embryos. 

EXAMPLE 3 

Preparation of Transgenic Plants 

The general method of genetic transformation used to produce transgenic maize 
plants is mediated by bombardment of embryogenically responsive immature embryos 
with tungsten particles associated with DNA plasmids, said plasmids consisting of a 
selectable and an unselectable marker gene. 

Preparation of Tissue 

Immature embryos of "High Type II" are the target for particle bombardment- 
mediated transformation. This genotype is the F, of two purebred genetic lines, parent A 
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and parent B, derived from Al 88 X B73. Both parents are selected for high competence of 
somatic embryogenesis. See Armstrong, et al , "Development and Availability of 
Germplasm with High Type II Culture Formation Response," Maize Genetics Cooperation 
Newsletter . Vol. 65, pp. 92 (1991); incorporated herein in its entirety by reference. 

Ears from F, plants are selfed or sibbed, and embryos are aseptically dissected from 
developing caryopses when the scutellum first becomes opaque. The proper stage occurs 
about 9-13 days post-pollination, and most generally about 10 days post-pollination, and 
depends on growth conditions. The embryos are about 0.75 to 1.5 mm long. Ears are 
surface sterilized with 20-50% Clorox for 30 min, followed by 3 rinses with sterile 
distilled water. 

Immature embryos are cultured, scutellum oriented upward, on embryogenic 
induction medium comprised of N6 basal salts (Chu, et al, "Establishment of an Efficient 
Medium for Anther Culture of Rice Through Comparative Experiments on the Nitrogen 
Sources," Scientia Sinica . (Peking), Vol. 18, pp. 659-668 (1975); incorporated herein in its 
entirety by reference; Eriksson vitamins (See Eriksson, T., "Studies on the Growth 
Requirements and Growth Measurements of Haplopappus gracilis ." Physiol. Plant . Vol. 
18, pp. 976-993 (1965); incorporated herein in its entirety by reference), 0.5 mg/1 thiamine 
HC1, 30 gm/1 sucrose, 2.88 gm/1 L-proline, 1 mg/1 2,4-dichlorophenoxyacetic acid, 2 gm/1 
Gelrite, and 8.5 mg/1 AgNO a . 

The medium is sterilized by autoclaving at 121 °C for 15 min and dispensed into 
100 X 25 mm petri dishes. AgN0 3 is filter-sterilized and added to the medium after 
autoclaving. The tissues are cultured in complete darkness at 28°C. After about 3 to 7 
days, generally about 4 days, the scutellum of the embryo has swelled to about double its 
original size and the protuberances at the coleorhizal surface of the scutellum indicate the 
inception of embryogenic tissue. Up to 100% of the embryos display this response, but 
most commonly, the embryogenic response frequency is about 80%. 

When the embryogenic response is observed, the embryos are transferred to a 
medium comprised of induction medium modified to contain 120 gm/1 sucrose. The 
embryos are oriented with the coleorhizal pole, the embryogenically responsive tissue, 
upwards from the culture medium. Ten embryos per petri dish are located in the center of 
a petri dish in an area about 2 cm in diameter. The embryos are maintained on this 
medium for 3-16 hr, preferably 4 hours, in complete darkness at 28°C just prior to 
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bombardment with particles associated with plasmid DNAs containing the selectable and 
unselectable marker genes. 

To effect particle bombardment of embryos, the particle-DNA agglomerates are 
accelerated using a DuPont PDS-1000 particle acceleration device. The particle-DNA 
5 agglomeration is briefly sonicated and 10 jlxI are deposited on macrocarriers and the ethanol 
allowed to evaporate. The macrocarrier is accelerated onto a stainless-steel stopping 
screen by the rupture of a polymer diaphragm (rupture disk). Rupture is effected by 
pressurized helium. Depending on the rupture disk breaking pressure, the velocity of 
particle-DNA acceleration may be varied. Rupture disk pressures of 200 to 1800 psi are 

10 commonly used, with those of 650 to 1 100 psi being more preferred, and about 900 psi 
being most highly preferred. Rupture disk breaking pressures are additive so multiple 
disks may be used to effect a range of rupture pressures. 

Preferably, the shelf containing the plate with embryos is 5.1 cm below the bottom 
of the macrocarrier platform (shelf #3), but may be located at other distances. To effect 

15 particle bombardment of cultured immature embryos, a rupture disk and a macrocarrier 
with dried particle-DNA agglomerates are installed in the device. The He pressure 
delivered to the device is adjusted to 200 psi above the rupture disk breaking pressure. A 
petri dish with the target embryos is placed into the vacuum chamber and located in the 
projected path of accelerated particles. A vacuum is created in the chamber, preferably 

20 about 28 inches Hg. After operation of the device, the vacuum is released and the petri dish 
is removed. 

Bombarded embryos remain on the osmotically adjusted medium during 
bombardment, and preferably for two days subsequently, although the embryos may 
remain on this medium for 1 to 4 days. The embryos are transferred to selection medium 

25 comprised of N6 basal salts, Eriksson vitamins, 0.5 mg/1 thiamine HC1, 30 gm/1 sucrose, 1 
mg/1 2,4-dichlorophenoxyacetic acid, 2 gm/1 Gelrite, 0.85 mg/1 AgN0 3 and 3 mg/1 
bialaphos. Bialaphos is added filter-sterilized. The embryos are subcultured to fresh 
selection medium at 10 to 14 day intervals. After about 7 weeks, embryogenic tissue, 
putatively transgenic for both selectable and unselected marker genes, is seen to proliferate 

30 from about 7% of the bombarded embryos. Putative transgenic tissue is rescued, and that 
tissue derived from individual embryos is considered to be an event and is propagated 
independently on selection medium. Two cycles of clonal propagation is achieved by 
visual selection for the smallest contiguous fragments of organized embryogenic tissue. 
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For regeneration of transgenic plants, embryogenic tissue is subcultured to medium 
comprised of MS salts and vitamins (Murashige, T. and F. Skoog, "A revised medium for 
rapid growth and bio assays with tobacco tissue cultures"; Phvsiologia Plantarum ; Vol. 15; 
pp. 473-497; 1962; incorporated herein in its entirety by reference), 100 mg/1 myo-inositol, 
5 60 gm/1 sucrose, 3 gm/1 Gelrite, 0.5 mg/1 zeatin, 1 mg/1 indole-3-acetic acid, 26.4 ng/1 cis- 
trans-abscissic acid, and 3 mg/1 bialaphos in 100 X 25 mm petri dishes and incubated in 
darkness at 28°C until the development of well-formed, matured somatic embryos can be 
visualized. This requires about 14 days. 

Well-formed somatic embryos are opaque and cream-colored, and are comprised of 

10 an identifiable scutellum and coleoptile. The embryos are individually subcultured to 

germination medium comprised of MS salts and vitamins, 100 mg/1 myo-inositol, 40 gm/1 
sucrose and 1.5 gm/1 Gelrite in 100 X 25 mm petri dishes and incubated under a 16 hr 
light: 8 hr dark photoperiod and 40 jLiEinsteinsm" 2 sec~ l from cool-white fluorescent tubes. 
After about 7 days, the somatic embryos have germinated and produced a well-defined 

15 shoot and root. The individual plants are subcultured to germination medium in 125 x 25 
mm glass tubes to allow further plant development. The plants are maintained under a 16 
hr light: 8 hr dark photoperiod and 40 i^Einsteinsm" 2 sec" ! from cool-white fluorescent 
tubes. 

After about 7 days, the plants are well-established and are transplanted to 
20 horticultural soil, hardened off, and potted into commercial greenhouse soil mixture and 
grown to sexual maturity in a greenhouse. An elite inbred line is used as a male to 
pollinate regenerated transgenic plants. 

Preparation of Particles 

25 Fifteen mg of tungsten particles (General Electric) , 0.5 to 1.8 jam, preferably 1 to 

1.8 jam, and most preferably 1 fim, are added to 2 ml of concentrated nitric acid. This 
suspension is sonicated at 0°C for 20 min (Branson Sonifier Model 450, 40% output, 
constant duty cycle). Tungsten particles are pelleted by centrifiigation at 10,000 rpm 
(Biofuge) for 1 min and the supernatant is removed. Two ml of sterile distilled water is 

30 added to the pellet and sonicate briefly to resuspend the particles. The suspension is 

pelleted, 1 ml of absolute ethanol is added to the pellet and sonicated briefly to resuspend 
the particles. Rinse, pellet, and resuspend the particles a further 2 times with sterile 
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distilled water, and finally resuspend the particles in 2 ml of sterile distilled water. The 
particles are subdivided into 250 fxl aliquots and stored frozen. - 



Preparation of particle-plasmid DNA association 

5 The stock of tungsten particles is sonicated briefly in a water bath sonicator 

(Branson Sonifier Model 450, 20% output, constant duty cycle) and 50 jal is transferred to 
a microfuge tube. Plasmid DNA is added to the particles for a final DNA amount of 0.1 to 
10 |ag in 10 \x\ total volume, and briefly sonicated. Preferably 1 jig total DNA is used. 
Specifically, 5 jil of PHP8001 (gz::HT12::gz) and 5^1 of PHP3528 (enhanced 

10 CAMV::Bar::PinIIX at 0.1 [ig/yxl in TE buffer, are added to the particle suspension. Fifty 
\il of sterile aqueous 2.5 M CaCl 2 are added, and the mixture is briefly sonicated and 
vortexed. Twenty ]il of sterile aqueous 0.1M spermidine are added and the mixture is 
briefly sonicated and vortexed. The mixture is incubated at room temperature for 20 min 
with intermittent brief sonication. The particle suspension is centrifuged, and the 

15 supernatant is removed. Two hundred fifty \xl of absolute ethanol is added to the pellet and 
briefly sonicated. The suspension is pelleted, the supernatant is removed, and 60 jj.1 of 
absolute ethanol is added. The suspension is sonicated briefly before loading the particle- 
DNA agglomeration onto macrocarriers. 



20 EXAMPLE 4 

Analysis of seed from transgenic plants for recombinant polypeptides having an elevated 
level of a preselected amino acid. 

Preparation of meals from corn seed 
25 Pooled or individual dry seed harvested from transformed plants from the greenhouse 

or the field are prepared in one of the following ways: 

A. Seed is imbibed in sterile water overnight (1 6-20 hr) at 4°C. The imbibed seed is 
dissected into embryo, endosperm and pericarp. The embryos and endosperm are 
separately frozen in liquid N 2 , the pericarps are discarded. Frozen tissue is ground 

30 with a liquid N 2 chilled ceramic mortar and pestle to a fine meal. The meals are 

dried under vacuum and stored at -20°C or -80°C. 

B. Dry whole seed is ground to a fine meal with a ball mill (Klecko), or by hand with 
a ceramic mortar and pestle. For analysis of endosperm only, the embryos are 
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removed with a drill and discarded. The remaining endosperm with pericarp is 
ground with a ball mill or a mortar and pesde. 



ELISA analysis 

5 Rabbit polyclonal anti HT 1 2 antisera are produced with synthetic HT 1 2 (See Rao et 

al supra) at Bethyl laboratories. An HT12 ELISA assay is developed and performed by the 
Analytical Biochemistry department of Pioneer Hi-Bred International, Inc., essentially as 
described by Harlow and Lane, Antibodies, A Laboratory Manual, Cold Springs Harbor 
Publication, New York (1988). Quantitative ELISA assays are first performed on pooled 

10 meals to identify positive events. Positive events are further analyzed by quantitative ELISA 
on individual kernels to determine the relative level of HT12 expression and transgene 
segregation ratio. Among 97 events tested, 59 show HT12 expression levels >1 000 ppm. 
The highest events have HT12 expression levels at 2-5% of the total seed protein. Typical 
results for HT12 levels for whole kernels of wild type corn, for one event (TC203 1) of corn 

15 transformed with the gz::HTl 2 ::gz chimeric gene, expressing HT 12 in the endosperm, for 
one event (TC320) of com transformed with the wx: :HT1 2: : wx chimeric gene, expressing 
HT1 2 in the endosperm, and for one event (TC2027) of com transformed with the 
glbl ::HT12::glbl chimeric gene, expressing HT 12 in the embryo, are in Table 1. 

Similarly, antisera are produced, ELISA assays are developed and assays of seed from 
20 transformed plants are performed for other derivatives of hordothionine (See Rao et al 

1994 Protein Engineering 7(12): 1485-1493, and WO 94/16078 published July 21, 1994), 
for the enhanced soybean albumin (ESA) (See U.S. Sen No. 08/61 8,91 1) and for BHL and 
other derivatives of the barley chymotrypsin inhibitor (See U.S. Ser. No. 08/740,682 filed 
November 1, 1996 and PCT/US97/20441 filed October 31, 1997), respectively. 

25 

Polvacrvlamidegel and immuno blot analysis 

SDS extracts of meals, molecular weight markers, and a synthetic HT12 positive 
control (see Rao et al supra) are separated on 1 6.5% or 8-22% polyacrylamide gradient Tris- 
Tricine gels (Schagger,H. and Von Jagow, G. 1987 Anal. Biochem ., 166:368). For immuno 
30 blot analysis, gels are transferred to PVDF membranes in 1 00 mM CAPS, pH 1 1 ; 1 0% 

methanol using a semidry blotter (Hoefer, San Francisco, CA). After transfer the membrane 
is blocked in BLOTTO (4% dry milk in Tris-buffered saline, pH 7.5) (Johnson, D. A. , 
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Gausch, J. W., Sportsman, J. R., and Elder, J. H. 1984, Gene Anal. Techn .. 1 :3). The blots 
are incubated with rabbit anti-HT 12 (same as used for ELISA) diluted 1 :2000 to 1 :7500 in 
BLOTTO 2 hr at room temperature (22°C) or overnight at 4°C. Blots are washed 4-5X with 
BLOTTO, then incubated 1 -2 hr with horseradish peroxidase-goatanti -rabbit IgG (Promega, 
5 Madison, WI) diluted 1 :7500 to 1 : 1 5000 in BLOTTO. After secondary antibody, the blots 
are washed 3X with BLOTTO followed by 2 washes with Tris-buffered saline, pH 7.5 . Blots 
are briefly incubated with enhanced chemiluminescence(ECL, Amersham, Arlington 
Heights, IL) substrate, and wrapped in plastic wrap. Reactive bands are visualized after 
exposure to x-ray film (Kodak Biomax MR) after short exposure times ranging from 5-1 20 
10 sec. 

HT1 2 transgenic seed shows a distinctive band not seen in wild type seed at the correct 
molecular weight and position as judged by the HT12 positive control standard and 
molecular weight markers. These results indicate that the expressed HT1 2 prepropeptide is 
being correctly processed like native HT in barley. Novel polypeptide bands co-migrating 
1 5 with the HT1 2 positive control are also observed in Coomassie stained polyacry lamide gels 
loaded with 1 Omg total extracted protein indicating substantial expression and accumulation 
of HT 1 2 protein in the seed. 

Similarly, other derivatives of hordothionin, soybean albumin, the enhanced soybean 
albumin (ESA), BHL and other derivatives of the barley chymotrypsin inhibitor are 
20 detected by polyacrylamide gel and immuno blot analysis. 

Amino acid composition analysis 

Meals from seed, endosperm or embryo that express a recombinant polypeptide 
having an elevated level of a preselected amino acid are sent to the University of Iowa 
25 Protein Structure Facility for amino acid composition analysis using standard protocols for 
digestion and analysis. 

Typical results for the amino acid composition of whole kernels of wild type com, for 
one event (TC2031) of com transformed with the gz::HTl 2: :gz chimeric gene, expressing 
HT12 in the endosperm, for one event (TC320) of com transformed with the wx::HT12::wx 
30 chimeric gene, expressing HT1 2 in the endosperm, and for one event (TC2027) of com 

transformed with the glbl ::HT12::glbl chimeric gene, expressing HT12 in the embryo, are in 
Table 1. 
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Table 1: HT12 ELISA analysis and amino acid composition of meal from whole kernels 
from wild type corn and from transformed com expressing recombinant HT12 - 



transgene 


none 


wx::HT12::wx 


gz::HT12::gz 


glbl::HT12::glbl 


event 


wild-type 


TC320 


TC2031 


TC2027 


T7T TQA 










HT 12 


protein ppm 


protein ppm 


protein ppm 


protein ppm 




0.00 


6200 


8000 


22600 


AA 












Meal % 


Meal % 


Meal % 


Meal % 




n=3 


n=2 


n=3 


n=4 


Lys 


0.29 


0.38 


0.39 


0.24 


Arg 


0.52 


0.58 


0.56 


0.45 


Cys 


0.12 


0.19 


0.17 


0.22 



5 The results in Table 1 demonstrate com expressing recombinant HT1 2 in the 

endosperm shows a significant increase of the preselected amino acid lysine. 



Table 2: SEQUENCE INFORMATION 



SEQUENCE ID 


PROMOTER 


GENE 


Seq. 1: pBSKP-HT 


None 


3361-2947 


Seq.2: pBSKP-HT12 


None 


3361-2947 


Seq. 3: PHP8001gz::HT12::gz expression vector 


676-2198 


2199-2612 


Seq. 4: PHP7999 glbl::HT12::glbl expression vector 


3271-1834 


1834-1420 


Seq. 5: PHP5025 wx::HT::wx expression vector 


43-1342 


1343-1757 


Seq. 6: PUP 1 1260 gz::ESA::gz expression vector 


676-2198 


2199-2675 


Seq. 7: PHP11427 gz::BHL::gz 


676-2198 


2199-2450 



10 

The invention is not limited to the exact details shown and described, for it should 
be understood that many variations and modifications may be made while remaining 
within the spirit and scope of the invention defined by the claims. 
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WHAT IS CLAIMED IS : 

1 . A transformed cereal plant seed, the endosperm of which is characterized as 
having an elevated level of at least one preselected amino acid compared to a seed 
from a corresponding plant which has not been transformed, wherein the amino 

5 acid is lysine, cysteine, threonine, tryptophan, arginine, valine, leucine, isoleucine, 

histidine or combinations thereof and optionally methionine. 

2. The seed according to claim 1 wherein the preselected amino acid is lysine, 
threonine or tryptophan and optionally a sulfur-containing amino acid. 

3. The seed according to Claim 2 wherein the preselected amino acid is lysine. 

10 4. The seed according to Claim 3 wherein the preselected amino acid is lysine and a 
sulfur-containing amino acid. 

5. The seed according to Claim 1 wherein the plant is selected from the group 
consisting of maize, wheat, rice, barley, oats, sorghum, millet and rye. 

6. The seed according to Claim 5 which is a maize seed. 

15 7. The seed according to Claim 1 wherein the plant expresses a transgenic protein 
having an elevated level of the preselected amino acid. 

8. The seed according to Claim 7 wherein the protein is barley chymotrypsin 
inhibitor, barley alpha hordothionin, soybean 2S albumin protein, rice high 
methionine protein, sunflower high methionine protein or derivatives of each 

20 protein. 

9. The seed according to Claim 1 wherein the amount of preselected amino acid in the 
seed is increased at least about 10 percent by weight compared to a corresponding 
seed which has not been transformed. 

10. The seed according to Claim 9 wherein the amount of the preselected amino acid in 
25 the seed is about 10 percent by weight to about 10 times greater compared to a 

corresponding seed which has not been transformed. 

1 1 . The seed according to Claim 1 0 wherein the amount of the preselected amino acid 
in the seed is about 15 percent by weight to about 10 times greater compared to a 
corresponding seed which has not been transformed. 

30 12. The seed according to Claim 1 1 wherein the amount of the preselected amino acid 
in the seed is about 20 percent by weight to about 10 times greater compared to a 
corresponding seed which has not been transformed. 
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13. An expression cassette comprising a seed endosperm-preferred promoter operably 
linked to a structural gene encoding a polypeptide elevated in content of a 
preselected amino acid. 

14. The cassette according to Claim 13 wherein the promoter is a gamma zein promoter 
5 or a waxy promoter. 

15. A vector comprising the expression cassette of Claim 13. 

16. A plant cell transformed with the vector of Claim 15. 

17. A transformed plant comprising the vector of Claim 15. 

18. A seed product obtainable from the transformed seed of Claim 1 . 

10 19. A seed from a cereal plant which has been transformed to express a heterologous 

protein in the endosperm of the seed, wherein the seed exhibits an elevated level of 
an essential amino acid compared to a plant which has not been transformed. 

20. A method for increasing the nutritional value of a cereal plant seed comprising: 
transforming a host plant cell with a vector comprising an expression cassette 

15 comprising a seed endosperm-preferred promoter operably linked to a structural 

gene encoding a polypeptide elevated in content of a preselected amino acid; 
recovering the transformed cells; regenerating a transformed plant; and recovering 
the seeds therefrom. 

21. A seed produced by the method of claim 20. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

5 (i) APPLICANT: Jung, Rudolf 

Beach, Larry R. 
Dress, Virginia M 
Rao, A. Gururaj 
Ranch, Jerome P. 
10 Ertl, DavidS. 

Higgins, Regina K. 

(ii) TITLE OF THE INVENTION: Alteration of Amino Acid Compositions 

in Seeds 

15 

(iii) NUMBER OF SEQUENCES: 13 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pioneer Hi-Bred International, Inc. 
20 (B) STREET: 7 1 00 N W 62nd Avenue, P.O. Box 1 000 

(C) CITY: Johnston 

(D) STATE: IA 

(E) COUNTRY: USA 

(F) ZIP: 50131 

25 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette ^ 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

30 (D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

35 (C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 



(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: Michel, Marianne H 

45 (B) REGISTRATION NUMBER: 35,286 

(C) REFERENCE/DOCKET NUMBER: 0815 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 515-334-4467 

50 (B) TELEFAX: 515-334-6883 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l : 

55 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3363 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

65 

TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 
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CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCACAATT 120 
CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAGCCT GGGGTGCCTA ATGAGTGAGC 1 80 
TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 240 
CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 300 
TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 360 
GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC 420 
ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 480 
TTCC AT AG GC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 
CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 
TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 660 
GTGGCGCTTT CTC AT AG CTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 720 
AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 780 
TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 840 
AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 900 
AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 960 
TTCGG AAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 1 020 
TTTTTTGTTT GCAAGCAGCA GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG 1080 
ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 1 140 
ATGAGATTAT CAAAAAGGAT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 1200 
TCAATCTAAA GTATATATGA GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG 1260 
GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTG CCTG ACT CCCCGTCGTG 1320 
TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 1 380 
GACCCACGCT CACCGGCTCC AG ATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 1 440 
CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 1 500 
GCTAG AGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC 1 560 
ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 1620 
AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 1 680 
ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT ! 740 
AATTCTCTTA CTGTCATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTG A GTACTCAACC 1 800 
AAGTCATTCT G AGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 1 860 
GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGG AAA ACGTTCTTCG 1 920 
GGGCGAAAAC TCTCAAGGAT CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT 1 980 
GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 2040 
GG AAGGC AAA ATGCCGC AAA AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTC ATA 2 1 00 
CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC 2160 
ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 2220 
GTGCCACCTA AATT GTAAGC GTTAATATTT TGTTAAAATT CGCGTTAAAT TTTTGTTAAA 2280 
TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 2340 
AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG 2400 
TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA CTACGTGAAC 2460 
CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA 2520 
AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 2580 
GGAAGAAAGC GAAAGGAGCG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 2640 
TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGCCATTCA 2700 
GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 2760 
CGAAAGGGGG ATGTGCTG C A AGGCGATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 2820 
GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 2880 
AGCTCCACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGCCCGAC 2940 
GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 3000 
CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 3060 
TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 3 120 
CTGAGTTGGA CACAAGGGCC AATTTGGGGA AGCCTGTAGG GCATTTTCCG CTACTTGTGA 3 1 80 
GTTTACACCT ACAGACGCCT GCGCATAACT TCTGAGCACC ACGGACGCGG CAAAGGTTGT 3240 
AGCAGTTTCT TCCTAGGGTG CTCCTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 3300 
GAACCAACCC CAGTATAAGT AAACACACCA TCACACCCTT GAGGCCCTTG CTGGTGGCCA 3360 
TGG 3363 



(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3365 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Other 
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(xi) SEQUENCE DESCRIPTION: SEQ IDNO:2: 

TCGACCTCGA GGGGGGGCCC GGTACCCAGC TTTTGTTCCC TTTAGTGAGG GTTAATTGCG 60 
5 CGCTTGGCGT AATCATGGTC ATAGCTGTTT CCTGTGTGAA ATTGTTATCC GCTCAC AATT 1 20 

CCACACAACA TACGAGCCGG AAGCATAAAG TGTAAAG CCT GGGGTGCCTA ATGAGTGAGC 1 80 
TAACTCACAT TAATTGCGTT GCGCTCACTG CCCGCTTTCC AGTCGGGAAA CCTGTCGTGC 240 
CAGCTGCATT AATGAATCGG CCAACGCGCG GGGAGAGGCG GTTTGCGTAT TGGGCGCTCT 300 
TCCGCTTCCT CGCTCACTGA CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA 360 
1 0 GCTCACTCAA AGGCGGTAAT ACGGTTATCC ACAGAATCAG GGG ATAACGC AGGAAAGAAC 420 
ATGTGAGCAA AAGGCCAGCA AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT 480 
TTCCATAGGC TCCGCCCCCC TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG 540 
CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC 600 
TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC 660 
1 5 GTGGCGCTTT CTCATAGCTC ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC 720 
AAGCTGGGCT GTGTGCACGA ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC 780 
TATCGTCTTG AGTCCAACCC GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 840 
AACAGGATTA GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT 900 
AACTACGGCT ACACTAGAAG GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC 960 

20 TTCGGAAAAA GAGTTGGTAG CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT 1020 
TTTTTTGTTT GCAAGC AGC A G ATTACGCGC AG AAAAAAAG G ATCTC AAG A AGATCCTTTG 1 080 
ATCTTTTCTA CGGGGTCTGA CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC 1 140 
ATGAGATTAT CAAAAAGG AT CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA 1 200 
TCAATCTAAA GTATATATG A GTAAACTTGG TCTG ACAGTT ACCAATGCTT AATCAGTGAG 1 260 

25 GCACCTATCT CAGCGATCTG TCTATTTCGT TCATCCATAG TTGCCTG ACT CCCCGTCGTG 1 320 

TAGATAACTA CGATACGGGA GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA 1380 
GACCCACGCT CACCGGCTCC AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG 1 440 
CGCAGAAGTG GTCCTGCAAC TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA 1 500 
GCTAGAGTAA GTAGTTCGCC AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC 1 560 

30 ATCGTGGTGT CACGCTCGTC GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA 1 620 
AGGCGAGTTA CATGATCCCC CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG 1 680 
ATCGTTGTCA GAAGTAAGTT GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT 1 740 
AATTCTCTTA CTGTC ATGCC ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC 1 800 
AAGTCATTCT GAGAATAGTG TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG 1 860 

35 GATAATACCG CGCCACATAG CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG 1 920 
GGGCGAAAAC TCTCAAGG AT CTTACCGCTG TTGAG ATCC A GTTCG ATGTA ACCCACTCGT 1 980 
GCACCCAACT GATCTTCAGC ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA 2040 
GGAAGGCAAA ATGCCGCAAA AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA 2100 
CTCTTCCTTT TTCAATATTA TTGAAGCATT TATCAGGGTT ATTGTCTCAT G AGCGGATAC 2 1 60 

40 ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA 2220 
GTGCCACCTA AATTGTAAGC GTTAATATTT TGTTAAAATT CG CGTTAAAT TTTTGTTAAA 2280 
TCAGCTCATT TTTTAACCAA TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT 2340 
AGACCGAGAT AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG 2400 
TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CG ATGGCCCA CTACGTGAAC 2460 

45 CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA AGCACTAAAT CGGAACCCTA 2520 

AAGGGAGCCC CCGATTTAGA GCTTGACGGG GAAAGCCGGC GAACGTGGCG AGAAAGGAAG 2580 
GGAAGAAAGC G AAAGG AG CG GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG 2640 
TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCCAT TCGCCATTCA 2700 
GGCTGCGCAA CTGTTGGGAA GGGCGATCGG TGCGGGCCTC TTCGCTATTA CGCCAGCTGG 2760 

50 CGAAAGGGGG ATGTGCTGCA AG G CG ATTAA GTTGGGTAAC GCCAGGGTTT TCCCAGTCAC 2820 
GACGTTGTAA AACGACGGCC AGTGAGCGCG CGTAATACGA CTCACTATAG GGCGAATTGG 2880 
AGCTCCACCG CGGTGGCGGC CGCTCTAGAA CTAGTGGATC CGTCGACTAG AGGGCCCGAC 2940 
GTCGAACTTA GGCACTAAGG GATGTGAGGC CAGCATCACC GTTGCAGAAA TTGACACAAG 3000 
CATCACCACA ATTTTCCAAA TAGAGTTTCA TTTCTTCGTC GTCAGCAGCT GCGTTGACCA 3060 

55 TGTAGTCACA CATGGAAGCC CTACACCCCA AGTTGCAATA CTTGACGGTG TCTGGTTCAT 3 120 
CTGAGTTGGA CACAAGGGCC AATTTGGGGA AGCCTTTCGG GCATTTTCCG CTACTAGTCA 3 180 
GCTTACACTT GCAGACGCCT GCGCAAAGCT TCTTGGCGCC TTTGACTTTG CAAAGGTTGT 3240 
AGCACTTCCT TCCCAGGGTA CTCTTGCAGC AACTCTTGCC TTCTACTTGC ACCTGTTCGA 3300 
GAACCAACCC CAGTATAAGT AAACACACCA TCACACCCTT GAGGCCCTTG CTGGTGGCCA 3360 

60 TGGTG 3365 

(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 
65 (A) LENGTH: 5360 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AA TTTT TGTT AAATCAGCTC 60 
ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAG ACCGA 1 20 

1 0 GATAGGGTTG AGTGTTGTTC CAGTTTGG AA C AAG AGTCC A CTATTAAAGA ACGTGG ACTC 1 80 
C AACGTC AAA GGGCG AAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 240 
CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 300 
CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 
AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 

1 5 CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 480 
CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 
GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 600 
TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 660 
CCGCGGTGGC GGCCGCTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 720 

20 ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 780 
GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 840 
ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 
CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 960 
ATATGGATCT ACAATACTCA TGTGCATCCA AACAAACTAC TTATATTGAG GTGAATTTGG 1 020 

25 TAG AAATTAA ACTAACTTAC ACACTAAGCC AATCTTTACT ATATTAAAGC ACCAGTTTCA 1 080 
ACGATCGTCC CGCGTCAATA TTATT AAAAA ACTCCTACAT TTCTTTATAA TCAACCCGCA 1 1 40 
CTCTTATAAT CTCTTCTCTA CTACTATAAT AAG AG AGTTT ATGTACAAAA TAAGGTG AAA 1 200 
TTATCTATAA GTGTTCTGGA TATTGGTTGT TGGCTCCCAT ATTCACACAA CCTAATCAAT 1 260 
AG AAAACATA TGTTTTATTA AAAC AAAATT TATC ATATAT CATATATATA TATATATCAT 1 320 

30 ATATATATAT AAACCGTAGC AATGC ACGGG CATATAACTA GTGCAACTTA ATACATGTGT 1380 
GTATTAAG AT G AATAAG AGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 1 440 
AGGGGTTGGA AACGATTAAA CGATTAAATC TCTTCCTAGT CAAAATTGAA TAGAAGG AGA 1 500 
TTTAATATAT CCCAATCCCC TTCGATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 1 560 
GAGGAACACG AAAGAACCAT GC ATTGGCAT GTAAAGCTCC AAG AATTTGT TGTATCCTTA 1 620 

35 ACAACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 1 680 
ACAAATCCTC TCTGTGTGCA AAG AAACACG GTGAGTCATG CCGAGATCAT ACTCATCTG A 1 740 
TATACATGCT TACAGCTCAC AAG ACATTAC AAACAACTCA TATTGCATTA CAAAGATCGT 1 800 
TTCATGAAAA ATAAAATAGG CCGG ACAGGA CAAAAATCCT TGACGTGTAA AGTAAATTTA 1 860 
CAACAAAAAA AAAGCCATAT GTCAAGCTAA ATCTAATTCG TTTTACGTAG ATCAACAACC 1 920 

40 TGTAG AAGGC AACAAAACTG AGCCACGCAG AAGTACAGAA TGATTCCAG A TGAACCATCG 1 980 
ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2040 
TACAGCCGTC TCGGTGGCAT AAG AACACAA GAAATTGTGT TAATTAATC A AAGCTATAAA 2 1 00 
TAACGCTCGC ATGCCTGTGC ACTTCTCCAT C ACCACCACT GGGTCTTCAG ACC ATTAGCT 2 1 60 
TTATCTACTC CAGAGCGCAG AAGAACCCGA TCGACACCAT GGCCACCAGC AAGGGCCTCA 2220 

45 AGGGTGTGAT GGTGTGTTTA CTTATACTGG GGTTGGTTCT CGAACAGGTG C AAGTAGAAG 2280 
GCAAGAGTTG CTGCAAGAGT ACCCTGGGAA GGAAGTGCTA CAACCTTTGC AAAGTCAAAG 2340 
GCGCCAAGAA GCTTTGCGCA GGCGTCTGCA AGTGTAAGCT GACTAGTAGC GGAAAATGCC 2400 
CGAAAGGCTT CCCCAAATTG GCCCTTGTGT CCAACTCAGA TGAACCAGAC ACCGTCAAGT 2460 
ATTGCAACTT GGGGTGTAGG GCTTCCATGT GTGACTACAT GGTCAACGCA GCTGCTGACG 2520 

50 ACGAAGAAAT GAAACTCTAT TTGG AAAATT GTGGTGATGC TTGTGTCAAT TTCTGCAACG 2580 
GTGATGCTGG CCTCACATCC CTTAGTGCCT AAGTTCGACG TCGGGCCCTC TAGTCGACGG 2640 
ATCCCCGGCG GTGTCCCCCA CTGAAGAAAC TATGTGCTGT AGTATAGCCG CTGCCCGCTG 2700 
GCTAGCTAGC TAGTTGAGTC ATTTAGCGGC GATGATTGAG TAATAATGTG TCACGCATCA 2760 
CCATGCATGG GTGGCAGTGT CAGTGTGAGC AATGACCTGA ATG AACAATT GAAATGAAAA 2820 

55 GAAAAAAGTA TTGTTCCAAA TTAAACGTTT TAACCTTTTA ATAGGTTTAT ACAATAATTG 2880 
ATATATGTTT TCTGTATATG TCTAATTTGT TATCATCCAT TTAGATATAG ACAAAAAAAA 2940 
ATCTAAGAAC TAAAACAAAT GCTAATTTGA AATGAAGGGA GTATATATTG GGATAATGTC 3000 
GATGAGATCC CTCGTAATAT CACCGACATC ACACGTGTCC AGTTAATGTA TCAGTGATAC 3060 
GTGTATTC AC ATTTGTTGCG CGTAGGCGTA CCCAAC AATT TTG ATCGACT ATC AG AAAGT 3 1 20 

60 CAACGGAAGC GAGTCGACCT CGAGGGGGGG CCCGGTACCC AGCTTTTGTT CCCTTTAGTG 3 1 80 
AGGGTTAATT GCGCGCTTGG CGTAATCATG GTCATAGCTG TTTCCTGTGT GAAATTGTTA 3240 
TCCGCTCACA ATTCCACACA ACATACGAGC CGGAAGCATA AAGTGTAAAG CCTGGGGTGC 3300 
CTAATGAGTG AGCTAACTCA CATTAATTGC GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG 3360 
AAACCTGTCG TGCCAGCTGC ATTAATGAAT CGGCCAACGC GCGGGGAGAG GCGGTTTGCG 3420 

65 TATTGGGCGC TCTTCCGCTT CCTCGCTCAC TGACTCGCTG CGCTCGGTCG TTCGGCTGCG 3480 

GCGAGCGGTA TCAGCTCACT CAAAGGCGGT AATACG GTT A TCCACAGAAT CAGGGGATAA 3540 
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CGCAGGAAAG AACATGTGAG CAAAAGGCCA GCAAAAGGCC AG G AACCGTA AAAAGGCCGC 3600 
GTTGCTGGCG TTTTT CCATA GGCTCGGCCC CCCTGACGAG CATCACAAAA ATCGACGCTC 3660 
AAGTCAGAGG TGGCGAAACC CGACAGGACT ATAAAGATAC CAGGCGTTTC CCCCTGGAAG 3720 
CTCCCTCGTG CGCTCTCCTG TTCCGACCCT GCCGCTTACC GGATACCTGT CCGCCTTTCT 3780 
5 CCCTTCGGGA AGCGTGGCGC TTTCTCATAG CTCACGCTGT AGGTATCTCA GTTCGGTGTA 3840 
GGTCGTTCGC TCCAAGCTGG GCTGTGTGCA CGAACCCCCC GTTCAGCCCG ACCGCTGCGC 3900 
CTTATCCGGT AACTATCGTC TTGAGTCCAA CCCGGTAAGA CACGACTTAT CGCCACTGGC 3960 
AGCAGCCACT GGTAACAGGA TTAGCAGAGC GAGGTATGTA GGCGGTGCTA CAGAGTTCTT 4020 
GAAGTGGTGG CCTAACTACG GCTACACTAG AAGGACAGTA TTTGGTATCT GCGCTCTGCT 4080 

1 0 GAAGCC AGTT ACCTTCGGAA AAAG AGTTGG TAGCTCTTGA TCCGGC AAAC AAACCACCGC 4 1 40 
TGGTAGCGGT GGTTTTTTTG TTTGCAAGCA GCAGATTACG CGCAGAAAAA AAGGATCTCA 4200 
AGAAGATCCT TTGATCTTTT CTACGGGGTC TG ACGCTCAG TGGAACG AAA ACTCACGTTA 4260 
AGGGATTTTG GTCATGAGAT TATCAAAAAG GATCTTCACC TAGATCCTTT TAAATTAAAA 4320 
ATG AAGTTTT AAATCAATCT AAAGTATATA TGAGTAAACT TGGTCTG ACA GTTACCAATG 43 80 

1 5 CTTAATCAGT GAGGCACCTA TCTCAGCG AT CTGTCTATTT CGTTCATCCA TAGTTGCCTG 4440 
ACTCCCCGTC GTGTAGATAA CTACGATACG GGAGGGCTTA CCATCTGGCC CCAGTGCTGC 4500 
AATGATACCG CGAGACCCAC GCTCACCGGC TCCAGATTTA TCAGCAATAA ACCAGCCAGC 4560 
CGGAAGGGCC GAGCGCAGAA GTGGTCCTGC AACTTTATCC GCCTCCATCC AGTCTATTAA 4620 
TTGTTGCCGG GAAGCTAGAG TAAGTAGTTC GCCAGTTAAT AGTTTGCGCA ACGTTGTTGC 4680 

20 CATTGCTACA GGCATCGTGG TGTCACGCTC GTCGTTTGGT ATGGCTTCAT TCAGCTCCGG 4740 
TTCCCAACGA TCAAGGCGAG TTACATGATC CCCCATGTTG TGCAAAAAAG CGGTTAGCTC 4800 
CTTCGGTCCT CCGATCGTTG TCAGAAGTAA GTTGGCCGCA GTGTTATCAC TCATGGTTAT 4860 
GGCAGCACTG CATAATTCTC TTACTGTCAT GCCATCCGTA AGATGCTTTT CTGTGACTGG 4920 
TGAGTACTCA ACCAAGTCAT TCTGAGAATA GTGTATGCGG CGACCGAGTT GCTCTTGCCC 4980 

25 GGCGTCAATA CGGGATAATA CCGCGCCACA TAGCAGAACT TTAAAAGTGC TCATCATTGG 5040 
AAAACGTTCT TCGGGGCG AA AACTCTC AAG GATCTTACCG CTGTTG AG AT CCAGTTCG AT 5 1 00 
GTAACCCACT CGTGCACCCA ACTG ATCTTC AGC ATCTTTT ACTTTC ACCA GCGTTTCTGG 5 1 60 
GTGAGCAAAA ACAGGAAGGC AAAATGCCGC AAAAAAGGGA ATAAGGGCGA CACGGAAATG 5220 
TTGAATACTC ATACTCTTCC TTTTTCAATA TTATTGAAGC ATTTATCAGG GTTATTGTCT 5280 

30 CATGAGCGGA TACATATTTG AATGTATTTA GAAAAATAAA CAAATAGGGG TTCCGCGCAC 5340 
ATTTCCCCGA AAAGTGCCAC 5360 

(2) INFORMATION FOR SEQ ID NO:4: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 551 1 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

45 TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA 60 
CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG 120 
TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA GCAG ATTGTA CTGAG AGTGC 1 80 
ACCATATGCG GTGTGAAATA CCGCACAGAT GCGTAAGGAG AAAATACCGC ATCAGGCGCC 240 
ATTCGCCATT CAGGCTGCGC AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT 300 

50 TACGCCAGCT GGCGAAAGGG GGATGTGCTG CAAGGCGATT AAGTTGGGTA ACGCCAGGGT 360 
TTTCCCAGTC ACGACGTTGT AAAACGACGG CCAGTGAATT CTTTTATGAA TAATAATAAT 420 
GCATATCTGT GCATTACTAC CTGGGATACA AGGGCTTCTC CGCCATAACA AATTGAGTTG 480 
CGATGCTGAG AACGAACGGG GAAGAAAGTA AGCGCCGCCC AAAAAAAACG AACATGTACG 540 
TCGGCTATAG CAGGTGAAAG TTCGTGCGCC AATGAAAAGG GAACGATATG CGTTGGGTAG 600 

55 TTGGGATACT TAAATTTGGA GAGTTTGTTG CATACACTAA TCCACTAAAG TTGTCTATCT 660 
TTTTAACAGC TCTAGGCAGG ATATAAGATT TATATCTAAT CTGTTGGAGT TGCTTTTAGA 720 
GTAACTTTTC TCTCTGTTTC GTTTATAGCC GATTAGCACA AAATTAAACT AGGTGACGAG 780 
AAATAAAGAA AAACGGAGGC AGTAAAAAAT ACCCAAAAAA ATACTTGGAG ATTTTTGTCT 840 
CAAAATTATC TTCTAATTTT AAAAG CT AC A TATTAAAAAT ACTATATATT AAAAATACTT 900 

60 CGAGATCATT GCTTGGGATG GGCAGGGCCA ATAGCTAATT GCTAAGGATG GGCTATATTT 960 
ATGTATCGTC TGAAACATGT AGGGGCTAAT AGTTAGATGA CTAATTTGCT GTGT TCGTA C 1020 
GGGGTGCTGT TTG AGCCTAG CGATGAAGGG TCATAGTTTC ATACAAG AAC TCACTTTTGG 1 080 
TTCGTCTGCT GTGTCTGTTC TCAGCGTAAC GGC ATCAATG G ATGCC AAAC TCCGCAAGGG 1 1 40 
GACAAATGAA GAAGCGAAGA GATTATAGAA CACGCACGTG TCATTATTTA TTTATGGACT 1200 

65 TGCCTCAGTA GCTTACAGCA TCGTACCCGC ACGTACATAC TACAGAGCCA CACTTATTGC 1260 
ACTGCCTGCC GCTTACGTAC ATAGTTAACA CGCAGAG AGG TATATACATA CACGTCCAAC 1320 



BNSDOCID: <WO 9940209A1_L> 



WO 99/40209 , PCT/US99/02061 



GTCTCCACTC AGGCTCATGC TACGTACGCA CGTCGGTCGC GCGCCACCCT CTCGTTGCTT 1 380 
CCTGCTCGTT TTGGCGAGCT AGAGGGCCCG ACGTCGAACT TAGG CACTAA GGGATGTGAG 1 440 
GCCAGCATCA CCGTTGCAG A AATTG ACACA AGC ATCACCA C AATTTTCCA AATAG AGTTT 1 500 
CATTTCTTCG TCGTCAGCAG CTGCGTTG AC CATGTAGTCA C ACATGG AAG CCCTACACCC 1 560 
5 CAAGTTGCAA TACTTGACGG TGTCTGGTTC ATCTGAGTTG GACACAXGGG CCAATTTGGG 1620 
G AAGCCTTTC GGGCATTTTC CGCTACTAGT CAGCTTACAC TTGCAGACGC CTGCGCAAAG 1 680 
CTTCTTGGCG CCTTTGACTT TGCAAAGGTT GTAGCACTTC CTTCCCAGGG TACTCTTGCA 1 740 
GCAACTCTTG CCTTCTACTT GCACCTGTTC GAGAACCAAC CCCAGTATAA GTAAACACAC 1 800 
CATCACACCC TTGAGGCCCT TGCTGGTGGC CATGGTGTAG TGTCGACTGT GATATCCTCG 1 860 

1 0 GGTGTGTGTT GGATCCTTGG GTTGGCTGTA TGCAGAACTA AAGCGGAGGT GGCGCGCATT 1 920 
TATACCAGCG CCGGGCCCTG GTACGTGGCG CGGCCGCGCG GCTACGTGGA GGAAGGCTGC 1 980 
GTGGCAGCAG ACACACGGGT CGCCACGTCC CGCCGTACTC TCCTTACCGT GCTTATCCGG 2040 
GCTCCGGCTC GGTGCACGCC AGGGTGTGGC CGCCTCTGAG CAGACTTTGT CGTGTTCCAC 2100 
AGTGGTGTCG TGTTCCGGGG ACTCCGATCC GCGGCGAGCG ACCGAGCGTG TAAAAGAGTT 2160 

1 5 CCTACTAGGT ACGTTCATTG TATCTGGACG ACGGGCAGCG GACAATTTGC TGTAAGAGAG 2220 
GGGCAGTTTT TTTTTAGAAA AACAGAGAAT TCCGTTGAGC TAATTGTAAT TCAACAAATA 2280 
AGCTATTAGT TGGTTTTAGC TTAGATTAAA GAAGCTAACG ACTAATAGCT AATAATTAGT 2340 
TGGTCTATTA GTTGACTCAT TTTAAGGCCC TGTTTCAATC TCGCGAGATA AACTTTAGCA 2400 
GCTATTTTTT AGCTACTTTT AGCCATTTGT AATCTAAACA GGAGAGCTAA TGGTGGTAAT 2460 

20 TGAAACTAAA CTTTAGCACT TCAATTCATA TAGCTAAAGT TTAGCAGGAA GCTAAACTTT 2520 
ATCCCGTGAG ATTGAAACGG GGCCTAAATC TCTCAGCTAT TTTTGATGCA AATTACTGTC 2580 
ACTACTGGAA TCGAGCGCTT TGCCGAGTGT CAAAGCCTGA AAAACACTCC GTAAAGACTT 2640 
TGCCTAGTGT GACACTCGAC AAAGAGATCT CGACGAACAG TACATCGACA ACGGCTTCTT 2700 
TGTCGAGTAC TTTTTATCGG ACACTTGACA AAGTCTTTGT CGAGTGAACT ACATTGAAAC 2760 

25 TCTATGATTT TATGTGTAGG TCACTTAGGT TTCTACACAT AGTACGTCAC AACTTT ACCG 2820 
AAACATTATC AAATTTTTAT CACAACCTCT ATATATGATA TCATGACATG TGGACAAGTT 2880 
TCATTAATTT CTGACTTTAT TTGTGTTTTA TACAATTTTT AAACAACTAG ATAACAAGTT 2940 
CACGGTCATG TTTAGTGAGC ATGGTGCTTG AAGATTCTGG TCTGCTTCTG AAATCGGTCG 3000 
TAACTTGTG C TAGATAACAT GCATATCATT TATTTTGCAT GCACGGTTTT CCATGTTTCG 3060 

30 AGTGACTTGC AGTTT AAATG TGAATTTTCC GAAGAAATTC AAATAAACGA ACTAAATCTA 3 1 20 
ATATTTATAG AAAACATTTT TGTAAATATG TAATTGTGCC AAAATG GTAC ATGTAGATCT 3 1 80 
ACATAGTGTA GGAACATACC ACAAAAAGTT TGGTTGGCAA AATAAAAAAA ATAAAATATA 3240 
CTTTATCGAG TGTCCAAGGA TGGCACTCGG CAAGCTTGGC GTAATCATGG TCATAGCTGT 3300 
TTCCTGTGTG AAATTGTTAT CCGCTCACAA TTCCACACAA CATACGAGCC GGAAGCATAA 3360 

35 AGTGTAAAGC CTGGGGTGCC TAATGAGTGA GCTAACTCAC ATTAATTGCG TTGCGCTCAC 3420 
TGCCCGCTTT CCAGTCGGGA AACCTGTCGT GCCAGCTGCA TTAATGAATC GGCCAACGCG 3480 
CGGGGAGAGG CGGTTTGCGT ATTGGGCGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC 3540 
GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT CAGCTCACTC AAAGGCGGTA ATACGGTTAT 3600 
CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG CAAAAGGCCA 3660 

40 GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC 3720 
ATCACAAAAA TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC 3780 
AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC GCTCTCCTGT TCCGACCCTG CCGCTTACCG 3840 
GATACCTGTC CGCCTTTCTC CCTTCGGGAA GCGTGGCGCT TTCTCAATGC TCACGCTGTA 3900 
GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCTGGG CTGTGTGCAC GAACCCCCCG 3960 

45 TTCAGCCCGA CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCC AAC CCGGTAAGAC 4020 
ACGACTTATC GCCACTGGCA GCAGCCACTG GTAACAGGAT TAGCAGAGCG AGGTATGTAG 4080 
GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA AGGACAGTAT 4140 
TTGGTATCTG CGCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT 4200 
CCGGCAAACA AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC 4260 

50 GCAGAAAAAA AGGATCTCAA GAAGATCCTT TGATCTTTTC TACGGGGTCT GACGCTCAGT 4320 
GGAACGAAAA CTCACGTTAA GGGATnTGG TCATGAGATT ATCAAAAAGG ATCTTCACCT 4380 
AG ATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT G AGTAAACTT 4440 
GGTCTGACAG TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC 4500 
GTTCATCCAT AGTTGCCTGA CTCCCCGTCG TGTAGATAAC TACGATACGG GAGGGCTTAC 4560 

55 CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT CCAGATTTAT 4620 
CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG 4680 
CCTCCATCCA GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA 4740 
GTTTGCGCAA CGTTGTTGCC ATTGCTACAG GCATCGTGGT GTCACGCTCG TCGTTTGGTA 4800 
TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC CCCATGTTGT 4860 

60 GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG 4920 
TGTTATCACT CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA 4980 
GATGCTTTTC TGTGACTGGT GAGTACTCAA CCAAGTCATT CTGAGAATAG TGTATGCGGC 5040 
GACCGAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC CGCGCCACAT AGCAGAACTT 5 1 00 
TAAAAGTGCT C ATC ATTGG A AAACGTTCTT CGGGGCG AAA ACTCTCAAGG ATCTTACCGC 5 1 60 

65 TGTTGAGATC CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA 5220 

CTTTCACCAG CGTTTCTGGG TGAGCAAAAA CAGGAAGGCA AAATGCCGCA AAAAAGGGAA 5280 
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TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATAT TATTGAAGCA 5340 
TTTATCAGGG TTATTGTCTC ATGAGCGGAT ACATATTTGA ATGTATTTAG AAAAATAAAC 5400 
AAATAGGGGT TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCTAA GAAACCATTA_5460 
TTATCATGAC ATTAACCTAT AAAAATAGGC GTATCACGAG GCCCTTTCGT C 5511 

(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: _ 

(A) LENGTH: 5115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

GTTGGGAGCT CTCCCATATG GTCGACCTGC AGGCGGCCGC TCTAGAACTA GTGGATCCCC 60 
CCCTCG AGGT CGACGGTATC G ATAAGCTTG ATATCTTAC A AGGCCCAGCC C AGCG ACCT A 1 20 
TTACACAGCC CGCTCGGGCC CGCG ACGTCG GGACACATCT TCTTCCCCCT TTTGGTGAAG 1 80 
CTCTGCTCGC AGCTGTCCGG CTCCTTGGAC GTTCGTGTGG CAGATTCATC TGTTGTCTCG 240 
TCTCCTGTGC TTCCTGGGTA GCTTGTGTAG TGGAGCTGAC ATGGTCTGAG CAGGCTTAAA 300 
ATTTGCTCGT AGACGAGGAG TACCAGCACA GCACGTTGCG GATTTCTCTG CCTGTGAAGT 360 
GCAACGTCTA GGATTGTCAC ACGCCTTGGT CGCGTCGCGT CGCGTCGCGT CGATGCGGTG 420 
GTG AGCAGAG CAGCAACAGC TGGGCGGCCC AACGTTGGCT TCCGTGTCTT CGTCGTACGT 480 
ACGCGCGCGC CGGGGACACG CAGCAGAGAG CGGAGAGCGA GCCGTGCACG GGGAGGTGGT 540 
GTGGAAGTGG AGCCGCGCGC CCGGCCGCCC GCGCCCGGTG GGCAACCCAA AAGTACCCAC 600 
GACAAGCGAA GGCGCCAAAG CGATCCAAGC TCCGGAACGC AACAGCATGC GTCGCGTCGG 660 
AGAGCCAGCC ACAAGCAGCC GAGAACCGAA CCGGTGGGCG ACGCGTCATG GGACGGACGC 720 
GGGCGACGCT TCCAAACGGG CCACGTACGC CGGCGTGTGC GTGCGTGCAG ACGACAAGCC 780 
AAGGCGAGGC AGCCCCCGAT CGGGAAAGCG TTTTGGGCGC GAGCGCTGGC GTGCGGGTCA 840 
GTCGCTGGTG CGCAGTGCCG GGGGGAACGG GTATCGTGGG GGGCGCGGGC GGAGGAGAGC 900 
GTGGCGAGGG CCGAGAGCAG CGCGCGGCCG GGTCACGCAA CGCGCCCCAC GTACTGCCCT 960 
CCCCCTCCGC GCGCGCTAGA AATACCG AGG CCTGG ACCGG GGGGGGGCCC CGTCACATCC 1 020 
ATCCATCGAC CGATCGATCG CCACAGCCAA CACCACCCGC CGAGGCGACG CGACAGCCGC 1080 
CAGG AGGAAG GAATAAACTC ACTGCCAGCC AGTG AAGGGG G AGAAGTGTA CTGCTCCGTC 1 1 40 
GACCAGTGCG CGCACCGCCC GGCAGGGCTG CTCATCTCGT CGACGACCAG GTTCTGTTCC 1200 
GATCCGATCC GATCCTGTCC TTGAGTTTCG TCCAGATCCT GGCGCGTATC TGCGTGTTTG 1260 
ATGATCCAGG TTCTTCG AAC CTAAATCTGT CCGTGCACAC GTCTTTTCTC TCTCTCCTAC 1 320 
GCAGTGGATT AATCGCCATG GCCACCAGCA AGGGCCTCAA GGGTGTGATG GTGTGTTTAC 1 380 
TTATACTGGG GTTGGTTCTC GAACAGGTGC AAGTAGAAGG CAAGAGTTGC TGCAAGAGTA 1 440 
CCCTGGGAAG GAAGTGCTAC AACCTTTGCA AAGTCAAAGG CGCCAAGAAG CTTTGCGCAG 1 500 
GCGTCTGCAA GTGTAAGCTG ACTAGTAGCG GAAAATGCCC GAAAGGCTTC CCCAAATTGG 1 560 
CCCTTGTGTC CAACTCAGAT GAACC AG ACA CCGTCAAGTA TTGC AACTTG GGGTGTAGGG 1 620 
CTTCCATGTG TGACTACATG GTCAACGCAG CTGCTGACGA CGAAGAAATG AAACTCTATT 1680 
TGGAAAATTG TGGTGATGCT TGTGTCAATT TCTGCAACGG TGATGCTGGC CTCACATCCC 1 740 
TTAGTGCCTA AGTTCGACGT CGGGCCCTCT AGATGCGGCC CGGGTGAAGA GTTCGCCCTG 1 800 
CAGGGCCCCT GATCTCGCGC GTGGTGCAAA GATGTTGGGA CATCTTCTTA TATATGCTGT I860 
TTCGCTTATG TG ATATGGAC AAGTATGTGT AG ATGCTTGC TTGTGCTAGT GTAATGTAGT 1 920 
GTAGTGGTGG CCAGTGGCAC AACCTAATAA GCGCATGAAC TAATTGCTTG CGTGTGTAGT 1980 
TAAGTACCGA TCGGTAATTT TATATTGCGA GTAAATAAAT GGACCTGTAG TGGTGGAGTA 2040 
AATAATCCCT GCTGTTCGGT GTTCTTATCG CTCCTCGTAT AGATATTATA TAGAGTACAT 2100 
TTTTCTCTCT CTGAATCCTA CGTGTGTGAA ATTTCTATAT CATTACTGTA AAATTTCTGC 2160 
GTTCCAAAAG AGACCATAGC CTATCTTTGG CCCTGTTTGT TTCGGCTTCT GGCAGCTTCT 2220 
GGCCACCAAA AGCTGCTGCG GACTGCCAAA CGCTCAGATT TTCAGCTAGC TTCTATAAAA 2280 
TTAGTTGGGG CAAAAACCAT CCAAAATCAA TATAAACACA TAATCGGTTG AGTCGTTGTA 2340 
ATATTAGGAA TCTGTCACTT TCTAGATCCT GAGCCCTATG AACAACTTTA TCTTTCTCCA 2400 
TACGTAATCG TAATGATACT CAGATTCTCT CCACAGCCAG ATTCTCCTCA CAGCCAGATT 2460 
TTCAGAAAAG CTGGTCAGAA AAAAGTTAAA CCAAACAGAC CCTTTGTGTA TGCATGGATC 2520 
GGCTTTCCCC GTCAAGCTCT AAATCGGGGG CTCCCTTTAG GGTTCCGATT TAGAGCTTTA 2580 
CGGCACCTCG ACCGCAAAAA ACTTGATTTG GGTGATGGTT CACGTAGTGG GCCATCGCCC 2640 
TGATAGACGG TTTTTCGCCC TTTGACGTTG GAGTCCACGT TCTTTAATAG TGGACTCTTG 2700 
TTCCAAACTG GAACAACACT CAACCCTATC TCGGTCTATT CTTTTGATTT ATAAGGGATT 2760 
TTGCCGATTT CGGCCTATTG GTTAAAAAAT GAGCTGATTT AACAAATATT TAACGCGAAT 2820 
TTTAACAAAA TATTAACGTT TACAATTTCG CCTGATGCGG TATTTTCTCC TTACGCATCT 2880 
GTGCGGTATT TCACACCGCA TACAGGTGGC ACTTTTCGGG GAAATGTGCG CGGAACCCCT 2940 
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ATTTGTTTAT TTTTCTAAAT ACATTCAAAT ATGTATCCGC TCATGAGACA ATAACCCTGA 3000 
TAAATGCTTC AATAATATTG AAAAAGGAAG AGTATGAGTA TTCAACATTT CCGTGTCGCC 3060 
CTTATTCCCT TTTTTGCGGC ATTTTGCCTT CCTGTTTTTG CTCACCCAGA AACGCTGGTG 3 120 
AAAGTAAAAG ATGCTGAAGA TCAGTTGGGT GCACGAGTGG GTTACATCGA ACTGGATCTC 3 1 80 
5 AACAGCGGTA AGATCCTTGA GAGTTTTCGC CCCGAAGAAC GTTTTCCAAT GATGAGCACT 3240 
TTTAAAGTTC TGCTATGTCA TACACTATTA TCCCGTATTG ACGCCGGGCA AGAGCAACTC 3300 
GGTCGCCGGG CGCGGTATTC TCAGAATGAC TTGGTTGAGT ACTCACCAGT CACAGAAAAG 3360 
CATCTTACGG ATGGCATGAC AGTAAGAGAA TTATGCAGTG CTGCCATAAC CATGAGTGAT 3420 
AACACTGCGG CCAACTTACT TCTGACAACG ATCGGAGGAC CGAAGGAGCT AACCGCTTTT 3480 

1 0 TTGCACAACA TGGGGGATCA TGTAACTCGC CTTGATCGTT GGGAACCGGA GCTGAATGAA 3540 
GCCATACCAA ACGACGAGCG TGACACCACG ATGCCTGTAG CAATGCCAAC AACGTTGCGC 3600 
AAACTATTAA CTGGCGAACT ACTTACTCTA GCTTCCCGGC AACAATTAAT AGACTGGATG 3660 
GAGGCGGATA AAGTTGCAGG ACCACTTCTG CGCTCGGCCC TTCCGGCTGG CTGGTTTATT 3720 
GCTGATAAAT CTGGAGCCGG TGAGCGTGGG TCTCGCGGTA TCATTGCAGC ACTGGGGCCA 3780 

1 5 GATGGTAAGC CCTCCCGTAT CGTAGTTATC TACACGACGG GGAGTCAGGC AACTATGGAT 3840 
GAACGAAATA GACAGATCGC TGAGATAGGT GCCTCACTGA TTAAGCATTG GTAACTGTCA 3900 
GACCAAGTTT ACTCATATAT ACTTTAGATT GATTTAAAAC TTCATTTTTA ATTTAAAAGG 3960 
ATCTAGGTGA AGATCCTTTT TGATAATCTC ATGACCAAAA TCCCTTAACG TGAGTTTTCG 4020 
TTCCACTGAG CGTCAGACCC CGTAGAAAAG ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT 4080 

20 CTGCGCGTAA TCTGCTGCTT GCAAACAAAA AAACCACCGC TACCAGCGGT GGTTTGTTTG 4 1 40 
CCGGATCAAG AGCTACCAAC TCTTTTTCCG AAGGTAACTG GCTTCAGCAG AGCGCAGATA 4200 
CCAAATACTG TCCTTCTAGT GTAGCCGTAG TTAGGCCACC ACTTCAAGAA CTCTGTAGCA 4260 
CCGCCTACAT ACCTCGCTCT GCTAATCCTG TTACCAGTGG CTGCTGCCAG TGGCGATAAG 4320 
TCGTGTCTTA CCGGGTTGGA CTCAAGACGA TAGTTACCGG ATAAGGCGCA GCGGTCGGGC 4380 

25 TGAACGGGGG GTTCGTGCAC ACAGCCCAGC TTGGAGCGAA CGACCTACAC CGAACTGAGA 4440 
TACCTACAGC GTGAGCTATG AGAAAGCGCC ACGCTTCCCG AAGGGAGAAA GGCGGACAGG 4500 
TATCCGGTAA GCGGCAGGGT CGGAACAGGA GAGCGCACGA GGGAGCTTCC AGGGGGAAAC 4560 
GCCTGGTATC TTTATAGTCC TGTCGGGTTT CGCCACCTCT GACTTGAGCG TCGATTTTTG 4620 
TGATGCTCGT CAGGGGGGCG GAGCCTATCG AAAAACGCCA GCAACGCGGC CTTTTTACGG 4680 

30 TTCCTGGCCT TTTGCTGGCC TTTTGCTCAC ATGTTCTTTC CTGCGTTATC CCCTG ATTCT 4740 

GTGGATAACC GTATTACCGC CTTTGAGTGA GCTGATACCG CTCGCCGCAG CCGAACGACC 4800 
GAGCGCAGCG AGTCAGTGAG CGAGGAAGCG GAAGAGCGCC CAATACGCAA ACCGCCTCTC 4860 
CCCGCGCGTT GGCCGATTCA TTAATGCAGC TGGCACGACA GGTTTCCCGA CTGGAAAGCG 4920 
GGCAGTGAGC GCAACGCAAT TAATGTGAGT TAGCTCACTC ATTAGGCACC CCAGGCTTTA 4980 

35 CACTTTATGC TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA ATTTCACACA 5040 
GG AAACAGCT ATGACCATG A TTACGCCAAG CTATTTAGGT GAC ACTATAG AATACTCAAG 5 1 00 
CTATGCATCC AACGC 5115 

(2) INFORMATION FOR SEQ ID NO:6: 

40 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5392 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ IDNO:6: 

50 

CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 
ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCG A 1 20 
GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 180 
CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 240 

55 CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 300 
CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 
AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 
CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 480 
CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 

60 GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 600 
TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 660 
CCGCGGTGGC GGCCGCTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 720 
ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 780 
GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 840 

65 ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 
CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 960 
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ATATGGATCT ACAATACTCA TGTGCATCC A AACAAACTAC TTATATTG AG GTG AATTTGG 1 020 
TAGAAATTAA ACTAACTTAC ACACTAAGCC AATCTTTACT AT ATT AAA G C ACCAGTTTCA 1 080 
ACG ATCGTCC CGCGTC AATA TTATTAAAAA ACTCCTAC AT TTCTTTATAA TCAACCCGCA 1 1 40 
CTCTTATAAT CTCTTCTCTA CTACTATAAT AAGAG AGTTT ATGTACAAAA TAAGGTGAAA 1 200 
5 TTATCTATAA GTGTTCTGGA TATTGGTTGT TGGCTCCCAT ATTCACACAA CCTAATCAAT 1 260 
AGAAAAC ATA TGTTTTATTA AAACAAAATT T ATC ATATAT C ATATATATA T ATAT ATC AT 1 320 
ATATATATAT AAACCGTAGC AATGCACGGG CATATAACTA GTGCAACTTA ATACATGTGT 1 380 
GTATTAAGAT G AATAAGAGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 1 440 
AGGGGTTGGA AACGATTAAA CGATTAAATC TCTTCCTAGT CAAAATTGAA TAGAAGGAGA 1 500 

1 0 TTTAATATAT CCCAATCCCC TTCG ATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 1 560 
GAGGAACACG AAAGAACCAT GCATTGGCAT GTAAAGCTCC AAGAATTTGT TGTATCCTTA ] 620 
ACAACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 1 680 
ACAAATCCTC TCTGTGTGCA AAGAAACACG GTGAGTCATG CCGAGATCAT ACTCATCTG A 1 740 
TATACATGCT TACAGCTCAC AAG ACATTAC AAACAACTCA TATTGCATTA CAAAG ATCGT 1 800 

1 5 TTCATG AAAA ATAAAATAGG CCGGACAGG A CAAAAATCCT TGACGTGTAA AGTAAATTTA 1 860 
CAACAAAAAA AAAGCCATAT GTCAAGCTAA ATCTAATTCG TTTTACGTAG ATCAACAACC 1 920 
TGTAG AAGGC AACAAAACTG AGCCACGCAG AAGTAC AG AA TGATTCC AG A TG AACCATCG 1 980 
ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2040 
TACAGCCGTA TCGGTGGCAT AAGAACACAA GAAATTGTGT TAATTAATCA AAGCTATAAA 21 00 

20 TAACGCTCGC ATGCCTGTGC ACTTCTCCAT CACCACCACT GGGTCTTCAG ACCATTAGCT 2 1 60 
TTATCTACTC CAGAGCGCAG AAGAACCCGA TCGACACCAT GACCAAGTTC ACAATCCTCC 2220 
TCATCTCTCT TCTCTTCTGC ATCGCCCACA CTTGCAGCGC CTCCAAATGG CAGCACCAGC 2280 
AAGATAGCTG CCGCAAGCAG CTTAAGGGGG TGAACCTCAC GCCCTGCGAG AAGCACATCA 2340 
TGGAGAAGAT CCAAGGCCGC GGCGATGACG ATGATGATG A TGACGACGAC AATCACATTC 2400 

25 TCAGGACCAT GCGGGGGAAG AATCACTACA TACGGAAGAA GGAAGGAAAA GACGAAGACG 2460 
AAGAAGAAGA AGGACACATG CAGAAGTGCT GCGCTTTGCA CTGGCATTTG GGGCTCTTAA 2520 
GCTCGCTCAT TTCTGTGCTG CAGAAGATAA TGGAGAACCA GAGCGAGGAA CTGGAGGAGA 2580 
AGGAGAAGAA GAAAATGGAG AAGGAGCTTA TGAACTTGGC TACTATGTGC AGGTTTGGGC 2640 
CCATGATCGG GTGCGACTTG TCCTCCGATG ACTAAGTTGA TCCCCGGCGG TGTCCCCCAC 2700 

30 TGAAGAAACT ATGTGCTGTA GTATAGCCGC TGGCTAGCTA GCTAGTTGAG TCATTTAGCG 2760 
GCGATGATTG AGTAATAATG TGTCACGCAT CACCATGCAT GGGTGGCAGT CTCAGTGTGA 2820 
GCAATGACCT GAATGAACAA TTGAAATGAA AAGAAAAAAG TATTGTTCCA AATTAAACGT 2880 
TTTAACCTTT TAATAGGTTT ATACAATAAT TGATATATGT TTTCTGTATA TGTCTAATTT 2940 
GTTATCATCC ATTTAGATAT AGACGAAAAA AAATCTAAGA ACTAAAACAA ATGCTAATTT 3000 

35 GAAATGAAGG GAGTATATAT TGGGATAATG TCGATGAGAT CCCTCGTAAT ATCACCGACA 3060 
TCACACGTGT CCAGTTAATG TATCAGTGAT ACGTGTATTC ACATTTGTTG CGCGTAGGCG 3 120 
TACCCAACAA TTTTGATCG A CTATCAG AAA GTCAACGGAA GCGAGTCG AC CTCGAGGGGG 3 1 80 
GGCCCGGTAC CCAGCTTTTG TTCCCTTTAG TGAGGGTTAA TTGCGCGCTT GGCGTAATCA 3240 
TGGTCATAGC TGTTTCCTGT GTGAAATTGT TATCCGCTCA CAATTCCACA CAACATACGA 3300 

40 GCCGGAAGCA TAAAGTGTAA AGCCTGGGGT GCCTAATGAG TGAGCTAACT CACATTAATT 3360 
GCGTTGCGCT CACTGCCCGC TTTCCAGTCG GGAAACCTGT CGTGCCAGCT G C ATTAATG A 3420 
ATCGGCCAAC GCGCGGGGAG AGGCGGTTTG CGTATTGGGC GCTCTTCCGC TTCCTCGCTC 3480 
ACTGACTCGC TGCGCTCGGT CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG 3540 
GTAATACGGT TATCCACAGA ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC 3600 

45 CAGCAAAAGG CCAGGAACCG TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC 3660 
CCCCCTGACG AGCATCACAA AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA 3720 
CTATAAAGAT ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC 3780 
CTGCCGCTTA CCGGATACCT GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAT 3840 
AGCTCACGCT GTAGGTATCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG 3900 

50 CACGAACCCC CCGTTCAGCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC 3960 
AACCCGGTAA GACACGACTT ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA 4020 
GCGAGGTATG TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT 4080 
AGAAGGACAG TATTTGGTAT CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGAGTT 4140 
GGTAGCTCTT GATCCGGCAA ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG 4200 

55 CAGCAGATTA CGCGCAGAAA AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG 4260 
TCTGACGCTC AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA 4320 
AGGATCTTCA CCTAGATCCT TTTAAATTAA AAATGAAGTT TTAAATCAAT CTAAAGTATA 4380 
TATGAGTAAA CTTGGTCTGA CAGTTACCAA TGCTTAATCA GTGAGGCACC TATCTCAGCG 4440 
ATCTGTCTAT TTCGTTCATC CATAGTTGCC TGACTCCCCG TCGTGTAGAT AACTACGATA 4500 

60 CGGGAGGGCT TACCATCTGG CCCCAGTGCT GCAATGATAC CGCGAGACCC ACGCTCACCG 4560 
GCTCCAGATT TATCAGCAAT AAACCAGCCA GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT 4620 
GCAACTTTAT CCGCCTCCAT CCAGTCTATT AATTGTTGCC GGGAAGCTAG AGTAAGTAGT 4680 
TCGCCAGTTA ATAGTTTGCG CAACGTTGTT GCCATTGCTA CAGGCATCGT GGTGTCACGC 4740 
TCGTCGTTTG GTATGGCTTC ATTCAGCTCC GGTTCCCAAC GATCAAGGCG AGTTACATGA 4800 

65 TCCCCCATGT TGTGCAAAAA AGCGGTTAGC TCCTTCGGTC CTCCGATCGT TGTCAGAAGT 4860 
AAGTTGGCCG CAGTGTTATC ACTCATGGTT ATGGCAGCAC TGCATAATTC TCTTACTGTC 4920 
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ATGCCATCCG TAAGATGCTT TTCTGTGACT GGTGAGTACT CAACCAAGTC ATTCTGAGAA 4980 
TAGTGTATGC GGCGACCGAG TTGCTCTTGC CCGGCGTCAA TACGGGATAA TACCGCGCCA 5040 
CATAGCAGAA CTTTAAAAGT GCTCATCATT GGAAAACGTT CTTCGGGGCG AAAACTCTCA_ 5100 
AG G ATCTTAC CGCTGTTG AG ATCCAGTTCG ATGTAACCCA CTCGTGCACC CAACTGATCT 5 1 60 
5 TCAGCATCTT TTACTTTCAC CAGCGTTTCT GGGTGAGCAA AAACAGGAAG GCA AAATG CC 5220 
GCAAAAAAGG GAATAAGGGC GACACGGAAA TGTTGAATAC TCATACTCTT CCTTTTTCAA 5280 
TATTATTGAA GCATTTATCA GGGTTATTGT CTCATGAGCG GATACATATT TGAATGTATT 5340 
TAGAAAAATA AACAAATAGG GGTTCCGCGC ACATTTCCCC GAAAAGTGCC AC 5392 

1 0 (2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5173 base pairs 

(B) TYPE: nucleic acid 

1 5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CTAAATTGTA AGCGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 
A TTTTTT AAC CAATAGGCCG AAATCGGC AA AATCCCTTAT AAATC AAAAG AATAG ACCG A 1 20 
G ATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 1 80 

25 CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 240 
CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 300 
CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 
AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 
CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCC CATTCGCCAT TCAGGCTGCG 480 

30 CAACTGTTGG GAAGGGCGAT CGGTGCGGGC CTCTTCGCTA TTACGCCAGC TGGCGAAAGG 540 
GGGATGTGCT GCAAGGCGAT TAAGTTGGGT AACGCCAGGG TTTTCCCAGT CACGACGTTG 600 
TAAAACGACG GCCAGTGAGC GCGCGTAATA CGACTCACTA TAGGGCGAAT TGGAGCTCCA 660 
CCGCGGTGGC GGCCG CTCTA GATTATATAA TTTATAAGCT AAACAACCCG GCCCTAAAGC 720 
ACTATCGTAT CACCTATCTA AATAAGTCAC GGGAGTTTCG AACGTCCACT TCGTCGCACG 780 

35 GAATTGCATG TTTCTTGTTG GAAGCATATT CACGCAATCT CCACACATAA AGGTTTATGT 840 
ATAAACTTAC ATTTAGCTCA GTTTAATTAC AGTCTTATTT GGATGCATAT GTATGGTTCT 900 
CAATCCATAT AAGTTAGAGT AAAAAATAAG TTTAAATTTT ATCTTAATTC ACTCCAACAT 960 
ATATGGATCT ACAATACTCA TGTGCATCCA AACAAACTAC TTATATTGAG GTG AATTTGG 1 020 
TAGAAATTAA ACTAACTTAC AC ACTAAGCC AATCTTTACT ATATTAAAGC ACCAGTTTCA 1 080 

40 ACGATCGTCC CGCGTCAATA TTATTAAAAA ACTCCTACAT TTCTTTATAA TCAACCCGCA 1 1 40 
CTCTTATAAT CTCTTCTCTA CTACTATAAT AAGAGAGTTT ATGTAC AAAA TAAGGTG AAA 1 200 
TTATCTATAA GTGTTCTGGA TATTGGTTGT TGGCTCCCAT ATTCACAC AA CCTAATCAAT 1 260 
AGAAAACATA TGTTTTATTA AAAC AAAATT TATCATATAT CATATATATA TATATATCAT 1 320 
ATATATATAT AAACCGTAGC AATGCACGGG CATATAACTA GTGCAACTTA ATACATGTGT 1380 

45 GTATTAAG AT GAATAAGAGG GTATCCAAAT AAAAAACTTG TTGCTTACGT ATGGATCGAA 1 440 
AGGGGTTGGA AACGATTAAA CG ATTAAATC TCTTCCTAGT CAAAATTG AA TAG AAGGAG A 1 500 
TTTAATATAT CCCAATCCCC TTCGATCATC CAGGTGCAAC CGTATAAGTC CTAAAGTGGT 1 560 
GAGGAACACG AAAGAACCAT GCATTGGCAT GTAAAGCTCC AAGAATTTGT TGTATCCTTA 1 620 
AC AACTCACA GAACATCAAC CAAAATTGCA CGTCAAGGGT ATTGGGTAAG AAACAATCAA 1 680 

50 ACAAATCCTC TCTGTGTGCA AAGAAACACG GTGAGTCATG CCGAGATCAT ACTCATCTGA 1 740 
TATACATGCT TACAGCTCAC AAGACATTAC AAACAACTCA TATTGCATTA CAAAGATCGT 1 800 
TTCATG AAAA ATAAAATAGG CCGGACAGG A CAAAAATCCT TG ACGTGTAA AGTAAATTTA 1 860 
CAACAAAAAA AAAGCCATAT GTCAAGCTAA ATCTAATTCG TTTTACGTAG ATCAACAACC 1920 
TGTAGAAGGC AAC AAAACTG AGCCACGCAG AAGTACAGAA TGATTCCAGA TG AACCATCG 1 980 

55 ACGTGCTACG TAAAGAGAGT GACGAGTCAT ATACATTTGG CAAGAAACCA TGAAGCTGCC 2040 
TAC AGCCGTA TCGGTGGCAT AAGAACACAA G AAATTGTGT TAATTAATCA AAGCTATAAA 2 1 00 
TAACGCTCGC ATGCCTGTGC ACTTCTCCAT CACCACCACT GGGTCTTCAG ACCATTAGCT 21 60 
TTATCTACTC CAGAGCGCAG AAGAACCCGA TCGACACCAT GAAGTCGGTG GAGAAGAAAC 2220 
CGAAGGGTGT GAAGACAGGT GCGGGTGACA AGCATAAGCT GAAGACAGAG TGGCCGGAGT 2280 

60 TGGTGGGGAA ATCGGTGGAG AAAGCCAAGA AGGTGATCCT GAAGGACAAG CCAGAGGCGC 2340 
AAATCATAGT TCTACCGGTT GGTACAAAGG TGGGTAAGCA TTATAAGATC GACAAGGTCA 2400 
AGCTTTTTGT GGATAAAAAG GACAACATCG CGCAGGTCCC CAGGGTCGGC TAGCCTCGAG 2460 
ATCCCCGGCG GTGTCCCCCA CTGAAGAAAC TATGTGCTGT AGTATAGCCG CTGGCTAGCT 2520 
AG CT AGTTG A GTCATTTAGC GGCGATGATT GAGTAATAAT GTGTCACGCA TCACCATGCA 2580 
65 TGGGTGGCAG TCTCAGTGTG AGCAATGACC TGAATGAACA ATTGAAATGA AAAGAAAAAA 2640 
GTATTGTTCC AAATTAAACG TTTTAACCTT TTAATAGGTT TATACAATAA TTGATATATG 2700 
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TTTTCTGTAT ATGTCTAATT TGTTATCATC CATTTAG ATA TAG ACG AAAA AAAATCTAAG 2760 
AACTAAAACA AATGCTAATT TGAAATGAAG GGAGTATATA TTGGGATAAT GTCGATGAGA 2820 
TCCCTCGTAA TATCACCGAC ATCACACGTG TCCAGTTAAT GTATCAGTGA TACGTGTATT 2880 
CACATTTGTT GCG CGTAGGC GTACCCAACA ATTTTGATCG ACTATCAGAA AGTCAACGGA 2940 
5 AGCGAGTCGA CCTCGAGGGG GGGCCCGGTA CCCAGCTTTT GTTCCCTTTA GTGAGGGTTA 3000 
ATTGCGCGCT TGGCGTAATC ATGGTCATAG CTGTTTCCTG TGTGAAATTG TTATCCGCTC 3060 
ACAATTCCAC ACAACATACG AGCCGG AAGC ATAAAGTGTA AAGCCTGGGG TGCCTAATG A 3 1 20 
GTG AGCTAAC TCACATTAAT TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG 3 1 80 
TCGTGCCAGC TGCATTAATG AATCGGCCAA CGCGCGGGGA GAGGCGGTTT GCGTATTGGG 3240 

10 CGCTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT GCGGCGAGCG 3300 

GTATCAGCTC ACTCAAAGGC GGTAATACGG TTATCCACAG AATCAGGGGA TAACGCAGGA 3360 
AAGAACATGT GAGCAAAAGG CCAGCAAAAG GCC AG GAACC GTAAAAAGGC CGCGTTGCTG 3420 
GCGTTTTTCC ATAGGCTCCG CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG 3480 
AGGTGGCGAA ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC 3540 

1 5 GTGCGCTCTC CTGTTCCG AC CCTGCCGCTT ACCGG ATACC TGTCCGCCTT TCTCCCTTCG 3600 
GGAAGCGTGG CGCTTTCTCA TAGCTCACGC TGTAGGTATC TCAGTTCGGT GTAGGTCGTT 3660 
CGCTCCAAGC TGGGCTGTGT GCACGAACCC CCCGTTCAGC CCGACCGCTG CGCCTTATCC 3720 
GGTAACTATC GTCTTGAGTC CAACCCGGTA AG AC ACG ACT TATCGCCACT GGCAGCAGCC 3780 
ACTGGTAACA GGATTAGCAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG 3 840 

20 TGGCCTAACT ACGG CT AC AC TAGAAGGACA GTATTTGGTA TCTGCGCTCT GCTGAAGCCA 3900 
GTTACCTTCG GAAAAAGAGT TGGTAGCTCT TG ATCCGGC A AACAAACCAC CGCTGGTAGC 3960 
GGTGGTTTTT TTGTTTGCAA GCAGCAGATT ACGCGCAG AA AAAAAGGATC TCAAGAAGAT 4020 
CCTTTGATCT TTTCTACGGG GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT 4080 
TTGGTCATGA GATTATCAAA AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT 4 1 40 

25 TTTAAATCAA TCTAAAGTAT ATATGAGTAA ACTTGGTCTG ACAGTTACCA ATGCTTAATC 4200 
AGTGAGGCAC CTATCTCAGC GATCTGTCTA TTTCGTTCAT CCATAGTTGC CTGACTCCCC 4260 
GTCGTGTAGA TAACTACGAT ACGGGAGGGC TTACCATCTG GCCCCAGTGC TGCAATGATA 4320 
CCGCGAGACC CACGCTCACC GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGG AAGG 4380 
GCCGAGCGCA G AAGTGGTCC TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC 4440 

30 CGGGAAGCTA GAGTAAGTAG TTCGCCAGTT AATAGTTTGC GCAACGTTGT TGCCATTGCT 4500 
ACAGGCATCG TGGTGTCACG CTCGTCGTTT GGTATGGCTT CATTCAGCTC CGGTTCCCAA 4560 
CGATCAAGGC GAGTTACATG ATCCCCCATG TTGTG CAAAA AAGCGGTTAG CTCCTTCGGT 4620 
CCTCCGATCG TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA 4680 
CTGCATAATT CTCTTACTGT CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGAGTAC 4740 

35 TCAACCAAGT CATTCTGAGA ATAGTGTATG CGGCGACCGA GTTGCTCTTG CCCGGCGTCA 4800 
ATACGGGATA ATACCGCGCC ACATAGCAGA ACTTTAAAAG TGCTCATCAT TGGAAAACGT 4860 
TCTTCGGGGC GAAAACTCTC AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTAACCC 4920 
ACTCGTGCAC CCAACTGATC TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA 4980 
AAAACAGGAA GG C AAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA ATGTTGAATA 5040 

40 CTCATACTCT TCCTTTTTC A ATATTATTG A AGC ATTTATC AGGGTT ATTG TCTCATG AGC 5 1 00 

GG ATAC AT AT TTG AATGT AT TTAG AAAAAT AAAC AAAT AG GGGTTCCGCG CAC ATTTCCC 5 1 60 
CGAAAAGTGC CAC 5 1 73 

(2) INFORMATION FOR SEQ ID NO:8: 

45 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

55 

AGTATAAGTA AACACACCAT CACACCCTTG AGGCCCTTGC TGGTGGCCAT GGTG 54 
(2) INFORMATION FOR SEQ ID NO:9: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

65 

(ii) MOLECULE TYPE: Other 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CCTCACATCC CTTAGTGCCT AAGTTCGACG TCGGGCCCTC TAGTCGACGG ATCCA 55 

5 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 35 base pairs 
1 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



15 



(ii) MOLECULE TYPE: Other 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO.10: 
AGCGGAAAAT GCCCGAAAGG CTTCCCCAAA TTGGC 35 
20 (2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 1 : 

TGCGCAGGCG TCTGCAAGTG TAAGCTGACT AGTAGCGGAA AATGC 45 
(2) INFORMATION FOR SEQ ID NO: 12: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

45 

TACAACCTTT GCAAAGTCAA AGGCGCCAAG AAGCTTTGCG CAGGCGTCTG 50 
(2) INFORMATION FOR SEQ ID NO: 13: 

50 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

55 

(ii) MOLECULE TYPE: Other 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
60 GCAAGAGTTG CTGCAAGAGT ACCCTGGGAA GGAAGTGCTA CAACCTTTGC 50 
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